Bug
When running CLI (hermes --resume) and the Telegram gateway simultaneously, session_search becomes permanently unavailable on the gateway side.
Root Cause
state.db uses WAL mode with a single-writer design. When both CLI and gateway write concurrently, SQLite lock contention causes create_session() to fail with database is locked (10s timeout). The error handler in run_agent.py:895-897 then sets self._session_db = None, permanently disabling session_search for that agent instance. The gateway agent cache (gateway/run.py:5050) reuses this broken agent, so all subsequent messages in that session also lack session_search.
Steps to Reproduce
- Start the gateway:
hermes gateway
- Start a CLI session:
hermes --resume <session_id> (or just hermes and do some work)
- While CLI is actively making tool calls (frequent DB writes), send a message on Telegram
- The Telegram agent tries
create_session() → SQLite returns database is locked
_session_db is set to None → session_search returns "Session database not available"
- Due to agent cache, all subsequent Telegram messages in this session also fail
Evidence
- Gateway log shows
🔍 recall "today" 0.0s [error] after the failure
- New Telegram sessions appear in file-based
~/.hermes/sessions/*.json but NOT in state.db
- Restarting the gateway does not fix it if the CLI is still writing
- Direct test confirms the lock:
sqlite3 state.db "INSERT INTO sessions ..." hangs when CLI is active
Suggested Fix
Several options (not mutually exclusive):
- Don't null out
_session_db on create_session failure — retry or use INSERT OR IGNORE/INSERT OR REPLACE instead of bare INSERT
- Increase SQLite timeout — 10s is too short when CLI is doing frequent flushes; 30-60s would help
- Retry with backoff in
create_session before giving up
- Document the limitation — the sessions doc says WAL "suits the gateway's multi-platform architecture" but doesn't mention CLI concurrent usage
Environment
- macOS, hermes-agent from git
- CLI and gateway running simultaneously (common workflow)
- SQLite WAL mode, timeout=10.0s
Relevant Code
run_agent.py:884-897 — create_session failure nulls _session_db
hermes_state.py:257 — bare INSERT INTO sessions (no conflict handling)
gateway/run.py:5044-5052 — agent cache reuses broken agent
hermes_state.py:124-128 — timeout=10.0
Bug
When running CLI (
hermes --resume) and the Telegram gateway simultaneously,session_searchbecomes permanently unavailable on the gateway side.Root Cause
state.dbuses WAL mode with a single-writer design. When both CLI and gateway write concurrently, SQLite lock contention causescreate_session()to fail withdatabase is locked(10s timeout). The error handler inrun_agent.py:895-897then setsself._session_db = None, permanently disablingsession_searchfor that agent instance. The gateway agent cache (gateway/run.py:5050) reuses this broken agent, so all subsequent messages in that session also lacksession_search.Steps to Reproduce
hermes gatewayhermes --resume <session_id>(or justhermesand do some work)create_session()→ SQLite returnsdatabase is locked_session_dbis set toNone→session_searchreturns"Session database not available"Evidence
🔍 recall "today" 0.0s [error]after the failure~/.hermes/sessions/*.jsonbut NOT instate.dbsqlite3 state.db "INSERT INTO sessions ..."hangs when CLI is activeSuggested Fix
Several options (not mutually exclusive):
_session_dboncreate_sessionfailure — retry or useINSERT OR IGNORE/INSERT OR REPLACEinstead of bareINSERTcreate_sessionbefore giving upEnvironment
Relevant Code
run_agent.py:884-897—create_sessionfailure nulls_session_dbhermes_state.py:257— bareINSERT INTO sessions(no conflict handling)gateway/run.py:5044-5052— agent cache reuses broken agenthermes_state.py:124-128—timeout=10.0