Summary
The background _session_expiry_watcher repeatedly flushes memories for the same expired sessions after every gateway restart, causing a flood of LLM API calls and log noise.
Root Cause
_session_expiry_watcher (gateway/run.py L1015) iterates sessions.json entries, flushes memories for expired sessions, then records them in self.session_store._pre_flushed_sessions — an in-memory set() (gateway/session.py L475). However:
- Flushed sessions are never removed from
sessions.json
_pre_flushed_sessions is not persisted to disk
- On gateway restart, the set resets to empty, so every expired session is re-flushed
This creates an O(N) flush storm on every restart, where N = number of expired-but-not-removed session entries. Each flush spawns a temporary AIAgent with an LLM call (the "This session is about to be automatically reset..." prompt), so the cost scales linearly with stale entries.
Reproduction
- Use the gateway for a few days with
session_reset.mode: both and idle_minutes: 1440
- Accumulate expired sessions (naturally happens with Telegram topic-per-thread routing)
- Restart the gateway
- Observe repeated
Session XXX expired (key=...), flushing memories proactively log entries for already-flushed sessions, each triggering an LLM API call
Impact
- Unnecessary LLM API spend on redundant memory flushes
- Log noise (flood of "expired" + "Pre-reset memory flush completed" lines)
- Downstream router (e.g. smart router proxy) flooded with scoring requests
- Watcher blocks on sequential flushes (~10-15s each), delaying processing of legitimately new expirations
Suggested Fix
After a successful flush + _pre_flushed_sessions.add(session_id), also remove the entry from sessions.json (or mark it as flushed with a persisted flag). The simplest approach:
# In _session_expiry_watcher, after successful flush:
await self._async_flush_memories(entry.session_id, key)
self._shutdown_gateway_honcho(key)
self.session_store._pre_flushed_sessions.add(entry.session_id)
# NEW: remove the stale entry so it does not re-flush on restart
del self.session_store._entries[key]
self.session_store._save()
Alternatively, persist _pre_flushed_sessions to disk alongside sessions.json.
Workaround
Manually prune expired entries from ~/.hermes/sessions/sessions.json and restart the gateway.
Environment
- hermes-agent @ HEAD (NousResearch/hermes-agent)
- macOS, launchd-managed gateway
session_reset: {mode: both, idle_minutes: 1440, at_hour: 4}
Summary
The background
_session_expiry_watcherrepeatedly flushes memories for the same expired sessions after every gateway restart, causing a flood of LLM API calls and log noise.Root Cause
_session_expiry_watcher(gateway/run.pyL1015) iteratessessions.jsonentries, flushes memories for expired sessions, then records them inself.session_store._pre_flushed_sessions— an in-memoryset()(gateway/session.pyL475). However:sessions.json_pre_flushed_sessionsis not persisted to diskThis creates an O(N) flush storm on every restart, where N = number of expired-but-not-removed session entries. Each flush spawns a temporary
AIAgentwith an LLM call (the "This session is about to be automatically reset..." prompt), so the cost scales linearly with stale entries.Reproduction
session_reset.mode: bothandidle_minutes: 1440Session XXX expired (key=...), flushing memories proactivelylog entries for already-flushed sessions, each triggering an LLM API callImpact
Suggested Fix
After a successful flush +
_pre_flushed_sessions.add(session_id), also remove the entry fromsessions.json(or mark it as flushed with a persisted flag). The simplest approach:Alternatively, persist
_pre_flushed_sessionsto disk alongsidesessions.json.Workaround
Manually prune expired entries from
~/.hermes/sessions/sessions.jsonand restart the gateway.Environment
session_reset: {mode: both, idle_minutes: 1440, at_hour: 4}