Skip to content

fix(gateway): sync cached agent session_id on session reset#3220

Closed
Mibayy wants to merge 1 commit into
NousResearch:mainfrom
Mibayy:fix/cached-agent-session-id-3210
Closed

fix(gateway): sync cached agent session_id on session reset#3220
Mibayy wants to merge 1 commit into
NousResearch:mainfrom
Mibayy:fix/cached-agent-session-id-3210

Conversation

@Mibayy

@Mibayy Mibayy commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #3210

Fixes Telegram (and all gateway platforms) losing conversation history after a session reset.

Root Cause

Gateway agents are cached by session_key to preserve the frozen system prompt and tool schemas across turns. When a session resets (idle expiry, daily reset, /new), a new session_id is assigned — but the cached AIAgent instance still holds its original session_id.

On the next message after a reset:

  1. get_or_create_session() creates a new session → new session_id
  2. _run_agent() hits the agent cache → reuses the old AIAgent (session_id = old)
  3. agent.run_conversation() completes → _flush_messages_to_session_db() writes to agent.session_id (the old session)
  4. On the following message: load_transcript(new_session_id) returns empty → the model sees only 1 message

This explains the reporter's exact symptom: "sometimes it knows about previous messages" (the first few turns after restart, before any reset) and "most of the time it just completely forgets" (every turn after the first reset).

Fix

One block added after the cache hit, inside _run_agent():

if agent.session_id != session_id:
    logger.debug(
        "Cached agent session_id mismatch (%s → %s), syncing",
        agent.session_id, session_id,
    )
    agent.session_id = session_id
    agent._last_flushed_db_idx = 0

_last_flushed_db_idx is reset to 0 so that _flush_messages_to_session_db() writes all messages from the start of the new session (no messages have been flushed to it yet).

Why _last_flushed_db_idx = 0 is safe

The flush cursor tracks how many messages have already been persisted for the current session. After a reset, the new session has zero messages in the DB — so starting the cursor at 0 is correct. Without this reset, flush_from = max(len(history), 0) would still be 0 for an empty history, so the reset is belt-and-suspenders but necessary for correctness when _last_flushed_db_idx accumulated from the previous session's run.

Scope

Single file, 14 lines added. All 1453 gateway tests pass.

When a session resets (idle expiry, daily reset, /new), the agent
cache still holds a reference to the old AIAgent instance with its
original session_id. On the next turn the cached agent is reused —
_flush_messages_to_session_db() then writes new messages to the old
(ended) session. load_transcript(new_session_id) returns empty, so
the model sees only the current message with no conversation history.

Fix: after pulling an agent from the cache, compare its session_id
to the current one. If they diverge, update both session_id and reset
_last_flushed_db_idx to 0 so the next flush writes from the start of
the new session.

Closes NousResearch#3210
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround labels May 2, 2026
@teknium1

Copy link
Copy Markdown
Contributor

This looks implemented on current main by the newer session-boundary cache handling, so I'm closing this from an automated hermes-sweeper review.

Evidence:

  • gateway/slash_commands.py:56 invalidates the run generation for /new / /reset, and gateway/slash_commands.py:68-78 cleans up and evicts the cached AIAgent before reset_session() rotates the session id.
  • gateway/session.py:1146-1196 creates the fresh SessionEntry with a new session_id, so the next post-reset turn builds against the new session after cache eviction.
  • gateway/run.py:5282-5301 covers idle/daily expiry finalization by cleaning up and evicting cached agents for expired sessions.
  • For auto-reset turns, gateway/run.py:7816-7824 adds a reset-specific context note, and gateway/run.py:13708-13724 includes that ephemeral prompt in the cache signature, forcing a stale pre-reset cached agent to miss rather than being reused.

The original PR's analysis was useful; main now solves the stale cached-agent/session boundary by evicting/replacing the cached agent instead of mutating agent.session_id in place.

@teknium1 teknium1 closed this Jun 10, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround sweeper:implemented-on-main Sweeper: behavior already present on current main type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Telegram chat does not include previous messages

3 participants