Skip to content

fix(gateway): keep Telegram topic routing explicit#29945

Closed
eugeneb1ack wants to merge 2 commits into
NousResearch:mainfrom
eugeneb1ack:fix/telegram-topic-binding-recovery
Closed

fix(gateway): keep Telegram topic routing explicit#29945
eugeneb1ack wants to merge 2 commits into
NousResearch:mainfrom
eugeneb1ack:fix/telegram-topic-binding-recovery

Conversation

@eugeneb1ack

Copy link
Copy Markdown

Summary

  • sync Telegram DM topic bindings when compression rotates the underlying session id
  • stop rewriting explicit non-General DM topic ids to the user's last-active topic
  • keep recovery for missing/root topic ids and update topic-mode regression tests

Why

A newly opened Telegram topic can arrive with an explicit thread id that is not bound yet. Treating that unknown id as recoverable forces the turn back into the previous topic, so the agent appears to reply in the wrong topic.

Compression can also rotate the Hermes session id while the Telegram topic stays the same. The binding must follow the compressed child session so later turns do not snap back to the pre-compression parent.

Test plan

  • ./venv/bin/python -m pytest -q tests/gateway/test_telegram_topic_mode.py -k 'recover or topic_binding_follows_session_id_rotation_after_compression'
  • ./venv/bin/python -m pytest -q tests/gateway/test_telegram_topic_mode.py

Related: #20470, #27166, #29712

hehehe0803 and others added 2 commits May 21, 2026 19:49
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter P2 Medium — degraded but workaround exists labels May 21, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing with #28870 (P1, broader Telegram topic hardening) and #23195 (P2, session-split binding sync). All address #20470 (compression session topic binding) and related issues.

teknium1 added a commit that referenced this pull request May 29, 2026
…hildren (#34409)

Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (#26204/
  #28870/#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes #20470, #29712, #33414. Acknowledges work in #23195
(@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713
(@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for this work — the fix landed via PR #34409 (#34409), merged as commit db96fc60d.

I reviewed all six PRs targeting #20470 / #29712 / #33414 and synthesized the load-bearing minimum:

  1. Your _sync_telegram_topic_binding helper pattern after each session_id rotation site (Family A, this PR's most direct ancestor).
  2. The compression-tip read-path self-heal from @bizyumov's PRs (Family B), so existing already-stale bindings recover on the next message without a restart.

Skipped intentionally: the advance_session_after_compression SessionStore primitive (analytics nicety), explicit cached-agent eviction (_compress_context() already mutates tmp_agent.session_id on the cached object), and the startup repair pass (redundant once the read path self-heals).

Closing as superseded — your work shaped the final design. Appreciate the thorough analysis.

@teknium1 teknium1 closed this May 29, 2026
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
…hildren (NousResearch#34409)

Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (NousResearch#26204/
  NousResearch#28870/NousResearch#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (NousResearch#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes NousResearch#20470, NousResearch#29712, NousResearch#33414. Acknowledges work in NousResearch#23195
(@litvinovvo), NousResearch#26204 (@bizyumov), NousResearch#28870 (@donrhmexe), NousResearch#29713
(@hehehe0803), NousResearch#29945 (@eugeneb1ack), NousResearch#33416 (@bizyumov).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants