fix(state.db): recover from malformed sqlite_master so hidden sessions reappear#43149
Merged
Conversation
…s reappear
The corruption class behind "Desktop/Dashboard show no sessions while
hundreds of session files sit on disk" is a malformed sqlite_master — most
often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts
entries — surfacing as:
sqlite3.DatabaseError: malformed database schema (messages_fts) -
table messages_fts already exists
SQLite parses the whole schema while preparing the FIRST statement on a
connection, so on this class every statement fails before it runs: PRAGMA
journal_mode (which is where SessionDB.__init__ actually trips, in
apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and
even DROP TABLE. The only operations that still work are
PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain
FTS-index rebuild at the _init_schema layer therefore cannot reach or fix
this; the canonical sessions/messages rows are intact — only the derived
schema is broken.
Add a dedicated recovery that operates where the failure actually happens:
- hermes_state.repair_state_db_schema(): backs up the raw file first, then a
least-destructive ladder — (1) de-duplicate sqlite_master keeping the
lowest rowid per object (preserves the existing FTS index), escalating to
(2) drop every messages_fts* schema object + VACUUM and let the next open
rebuild the FTS index from messages. sessions/messages are never modified.
Plus is_malformed_db_error() to discriminate this class.
- SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs
once (process-guarded against loops / concurrent web_server opens) and
reopens, so Desktop/Dashboard recover on their own instead of silently
showing "no sessions".
- hermes doctor --fix detects the malformed class and repairs it (reporting
the recovered session count + backup name).
- hermes sessions repair [--check-only] [--no-backup] runs on the raw file
path, since SessionDB() itself cannot open a malformed DB.
Supersedes #32589 and #33869: both targeted FTS corruption but gated their
repair behind statements (integrity_check / SELECT / DROP TABLE) that
themselves fail on this class, and neither addressed the apply_wal_with_fallback
open-time failure. Credit preserved via Co-authored-by.
Closes #33865.
Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com>
Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-attribute |
21 |
invalid-argument-type |
1 |
unresolved-import |
1 |
First entries
tests/gateway/test_telegram_topic_mode.py:1333: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
tests/test_hermes_state.py:3584: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
hermes_state.py:1169: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `None | Connection`
tests/hermes_cli/test_web_server.py:662: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `None | Connection`
hermes_cli/web_server.py:8254: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
tests/gateway/test_telegram_topic_mode.py:1100: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `None | Connection`
hermes_state.py:821: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `Connection`, found `None | Connection`
hermes_cli/main.py:11272: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
hermes_state.py:825: [unresolved-attribute] unresolved-attribute: Attribute `rollback` is not defined on `None` in union `None | Connection`
hermes_state.py:998: [unresolved-attribute] unresolved-attribute: Attribute `cursor` is not defined on `None` in union `None | Connection`
tests/hermes_state/test_resolve_resume_session_id.py:30: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
tests/hermes_state/test_resolve_resume_session_id.py:34: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `None | Connection`
tests/test_state_db_malformed_repair.py:19: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/hermes_cli/test_resolve_last_session.py:148: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
hermes_cli/cli_agent_setup_mixin.py:504: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
tests/tools/test_session_search.py:459: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
tests/agent/test_compression_concurrent_fork.py:85: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
tests/hermes_cli/test_resolve_last_session.py:152: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `None | Connection`
hermes_state.py:4562: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
tests/tools/test_session_search.py:509: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `None | Connection`
hermes_cli/cli_agent_setup_mixin.py:509: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `None | Connection`
tests/hermes_cli/test_web_server.py:659: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
tests/test_state_db_malformed_repair.py:117: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `None | Connection`
✅ Fixed issues (20):
| Rule | Count |
|---|---|
unresolved-attribute |
19 |
invalid-argument-type |
1 |
First entries
tests/hermes_state/test_resolve_resume_session_id.py:34: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `Connection | None`
hermes_state.py:591: [unresolved-attribute] unresolved-attribute: Attribute `rollback` is not defined on `None` in union `Connection | None`
tests/hermes_cli/test_resolve_last_session.py:148: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
hermes_state.py:587: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `Connection`, found `Connection | None`
hermes_cli/cli_agent_setup_mixin.py:504: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
tests/tools/test_session_search.py:459: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
tests/agent/test_compression_concurrent_fork.py:85: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
hermes_state.py:4328: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
tests/hermes_cli/test_web_server.py:659: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
tests/hermes_cli/test_resolve_last_session.py:152: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `Connection | None`
tests/gateway/test_telegram_topic_mode.py:1333: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
tests/test_hermes_state.py:3584: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
tests/tools/test_session_search.py:509: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `Connection | None`
hermes_cli/web_server.py:8254: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
hermes_cli/cli_agent_setup_mixin.py:509: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `Connection | None`
hermes_state.py:935: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `Connection | None`
tests/hermes_cli/test_web_server.py:662: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `Connection | None`
tests/hermes_state/test_resolve_resume_session_id.py:30: [unresolved-attribute] unresolved-attribute: Attribute `execute` is not defined on `None` in union `Connection | None`
tests/gateway/test_telegram_topic_mode.py:1100: [unresolved-attribute] unresolved-attribute: Attribute `commit` is not defined on `None` in union `Connection | None`
hermes_state.py:764: [unresolved-attribute] unresolved-attribute: Attribute `cursor` is not defined on `None` in union `Connection | None`
Unchanged: 5541 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
This was referenced Jun 9, 2026
itskaism
pushed a commit
to itskaism/hermes-agent
that referenced
this pull request
Jun 10, 2026
…s reappear (NousResearch#43149) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes NousResearch#32589 and NousResearch#33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes NousResearch#33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> (cherry picked from commit 218452b)
wachoo
pushed a commit
to wachoo/hermes-agent
that referenced
this pull request
Jun 10, 2026
…s reappear (NousResearch#43149) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes NousResearch#32589 and NousResearch#33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes NousResearch#33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
…s reappear (NousResearch#43149) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes NousResearch#32589 and NousResearch#33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes NousResearch#33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk". The backend logs:
This is a malformed
sqlite_master(typically a duplicate object row — twoCREATE VIRTUAL TABLE messages_ftsentries), which is a worse class than a malformed FTS inverted index. SQLite parses the entire schema while preparing the first statement on a connection, so on this class every statement fails before it runs — reproduced:Crucially the error fires in
apply_wal_with_fallback()(hermes_state.py), before_init_schema()runs — so a plain FTS-rebuild at the schema-init layer can neither reach nor fix it. The canonicalsessions/messagesrows are intact; only the derived schema is broken.Changes
hermes_state.repair_state_db_schema()— backs up the raw file first, then a least-destructive ladder:sqlite_masterkeeping the lowest rowid per object (preserves the existing FTS index intact);messages_fts*schema object +VACUUMand let the next open rebuild the FTS index frommessages.sessions/messagesare never modified. Plusis_malformed_db_error()to discriminate this class.SessionDB.__init__auto-heals — on a malformed-schema open error it repairs once (process-guarded against loops / concurrentweb_serveropens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions".hermes doctor --fix— detects the malformed class and repairs it, reporting recovered session count + backup name.hermes sessions repair [--check-only] [--no-backup]— operates on the raw file path, sinceSessionDB()itself cannot open a malformed DB.Supersedes
Supersedes #32589 and #33869. Both targeted FTS corruption but gated their repair behind statements (
integrity_check/SELECT/DROP TABLE) that themselves fail on this class, and neither addressed the open-time (apply_wal_with_fallback) failure. Credit preserved viaCo-authored-by.Closes #33865.
Test plan
tests/test_state_db_malformed_repair.py(7 tests): documents that every statement fails on this corruption; repair preserves sessions + messages; rebuilt index search works;SessionDBauto-heals on open; auto-heal attempted only once per process; clean-DB repair is a no-op.tests/hermes_state/,tests/test_hermes_state*.py,tests/hermes_cli/test_doctor*.py,tests/hermes_cli/test_sessions_delete.py,tests/test_lazy_session_regressions.py— all green (400+ tests).hermes sessions repair --check-only→ reports malformedhermes sessions repair→ backs up,strategy: dedup_schema, "1 sessions recovered"hermes sessions list→ recovered session listedhermes doctor --fix→ "Repaired state.db schema (N sessions recovered)"