Skip to content

_session_expiry_watcher re-flushes expired sessions on every gateway restart #2506

@tumf

Description

@tumf

Summary

The background _session_expiry_watcher repeatedly flushes memories for the same expired sessions after every gateway restart, causing a flood of LLM API calls and log noise.

Root Cause

_session_expiry_watcher (gateway/run.py L1015) iterates sessions.json entries, flushes memories for expired sessions, then records them in self.session_store._pre_flushed_sessions — an in-memory set() (gateway/session.py L475). However:

  1. Flushed sessions are never removed from sessions.json
  2. _pre_flushed_sessions is not persisted to disk
  3. On gateway restart, the set resets to empty, so every expired session is re-flushed

This creates an O(N) flush storm on every restart, where N = number of expired-but-not-removed session entries. Each flush spawns a temporary AIAgent with an LLM call (the "This session is about to be automatically reset..." prompt), so the cost scales linearly with stale entries.

Reproduction

  1. Use the gateway for a few days with session_reset.mode: both and idle_minutes: 1440
  2. Accumulate expired sessions (naturally happens with Telegram topic-per-thread routing)
  3. Restart the gateway
  4. Observe repeated Session XXX expired (key=...), flushing memories proactively log entries for already-flushed sessions, each triggering an LLM API call

Impact

  • Unnecessary LLM API spend on redundant memory flushes
  • Log noise (flood of "expired" + "Pre-reset memory flush completed" lines)
  • Downstream router (e.g. smart router proxy) flooded with scoring requests
  • Watcher blocks on sequential flushes (~10-15s each), delaying processing of legitimately new expirations

Suggested Fix

After a successful flush + _pre_flushed_sessions.add(session_id), also remove the entry from sessions.json (or mark it as flushed with a persisted flag). The simplest approach:

# In _session_expiry_watcher, after successful flush:
await self._async_flush_memories(entry.session_id, key)
self._shutdown_gateway_honcho(key)
self.session_store._pre_flushed_sessions.add(entry.session_id)
# NEW: remove the stale entry so it does not re-flush on restart
del self.session_store._entries[key]
self.session_store._save()

Alternatively, persist _pre_flushed_sessions to disk alongside sessions.json.

Workaround

Manually prune expired entries from ~/.hermes/sessions/sessions.json and restart the gateway.

Environment

  • hermes-agent @ HEAD (NousResearch/hermes-agent)
  • macOS, launchd-managed gateway
  • session_reset: {mode: both, idle_minutes: 1440, at_hour: 4}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions