Skip to content

fix: harden Telegram topic routing across session splits#28870

Closed
donrhmexe wants to merge 1 commit into
NousResearch:mainfrom
donrhmexe:fix/telegram-topic-compression-routing
Closed

fix: harden Telegram topic routing across session splits#28870
donrhmexe wants to merge 1 commit into
NousResearch:mainfrom
donrhmexe:fix/telegram-topic-compression-routing

Conversation

@donrhmexe

@donrhmexe donrhmexe commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • advance gateway routes to compression-created child sessions without rewriting compression lineage as ordinary session switches
  • keep Telegram DM-topic bindings aligned across both runtime and pre-agent hygiene compression splits
  • resolve stale Telegram topic bindings to the latest compression tip before transcript/history load
  • evict cached agents when their loaded session id disagrees with the canonical route session
  • preserve Telegram topic routing when retrying sends after a stale/deleted reply anchor

Why

Long Telegram DM-topic conversations can cross compression/hygiene thresholds and create child sessions. If the JSON route/session index advances but telegram_dm_topic_bindings still points at the pre-compression/root session, the next Telegram message reloads the old large transcript. That can cause repeated compression, stale cached-agent promotion, and visible topic/session chaos.

This patch is intentionally broader than a single rebind call. It treats compression as route publication, protects compression lineage, self-heals stale bindings, prevents cached-agent/session mismatches, and avoids falling back to the wrong Telegram topic when only the reply anchor is stale.

Related issues and PRs consulted

Main changes

  • Add SessionStore.advance_session_after_compression() as a narrow alternative to switch_session() for compression continuations.
  • Add gateway helpers to:
    • follow SessionDB.get_compression_tip() for stale Telegram bindings
    • refresh Telegram topic bindings after session id rollover
    • recover a lost source.thread_id from the old binding before rebinding
    • evict cached agents with mismatched session ids
  • Apply the route-publication path in:
    • inbound Telegram topic binding resolution before history load
    • pre-agent hygiene compression
    • runtime compression split handling after an agent turn
  • Preserve message_thread_id/topic routing when Telegram rejects only a stale reply_to_message_id.

Test plan

  • python -m pytest tests/gateway/test_telegram_topic_mode.py tests/gateway/test_telegram_thread_fallback.py -o 'addopts=' -q

Result: 87 passed in 12.98s

Additional local verification from the development session:

  • dependency-focused subsets covering ACP/FastAPI/WebSocket collection passed: 434 passed
  • full suite was attempted but exceeded the local 600s timeout before completion

@donrhmexe donrhmexe changed the title fix: preserve Telegram topic routes across compression fix: harden Telegram topic routing across session splits May 19, 2026
@donrhmexe donrhmexe force-pushed the fix/telegram-topic-compression-routing branch from 53c2a5b to f47fe2f Compare May 19, 2026 18:21
@donrhmexe donrhmexe force-pushed the fix/telegram-topic-compression-routing branch from f47fe2f to 0657180 Compare May 19, 2026 18:26
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter P1 High — major feature broken, no workaround labels May 19, 2026
@donrhmexe

Copy link
Copy Markdown
Contributor Author

Closing at author request.

@donrhmexe donrhmexe closed this May 19, 2026
@donrhmexe donrhmexe deleted the fix/telegram-topic-compression-routing branch May 19, 2026 18:27
@alt-glitch

Copy link
Copy Markdown
Collaborator

Supersedes #26204 and #26088 — broader fix covering compression route publication, lineage preservation, stale binding self-heal, cached-agent eviction, and Telegram reply anchor fallback. Also addresses #20470 and #27166.

@donrhmexe donrhmexe mentioned this pull request May 19, 2026
@donrhmexe donrhmexe reopened this May 19, 2026
teknium1 added a commit that referenced this pull request May 29, 2026
…hildren (#34409)

Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (#26204/
  #28870/#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes #20470, #29712, #33414. Acknowledges work in #23195
(@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713
(@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for this work — the fix landed via PR #34409 (#34409), merged as commit db96fc60d.

I reviewed all six PRs targeting #20470 / #29712 / #33414 and synthesized the load-bearing minimum:

  1. Your _sync_telegram_topic_binding helper pattern after each session_id rotation site (Family A, this PR's most direct ancestor).
  2. The compression-tip read-path self-heal from @bizyumov's PRs (Family B), so existing already-stale bindings recover on the next message without a restart.

Skipped intentionally: the advance_session_after_compression SessionStore primitive (analytics nicety), explicit cached-agent eviction (_compress_context() already mutates tmp_agent.session_id on the cached object), and the startup repair pass (redundant once the read path self-heals).

Closing as superseded — your work shaped the final design. Appreciate the thorough analysis.

@teknium1 teknium1 closed this May 29, 2026
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
…hildren (NousResearch#34409)

Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (NousResearch#26204/
  NousResearch#28870/NousResearch#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (NousResearch#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes NousResearch#20470, NousResearch#29712, NousResearch#33414. Acknowledges work in NousResearch#23195
(@litvinovvo), NousResearch#26204 (@bizyumov), NousResearch#28870 (@donrhmexe), NousResearch#29713
(@hehehe0803), NousResearch#29945 (@eugeneb1ack), NousResearch#33416 (@bizyumov).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants