fix(gateway): sync Telegram topic binding after session split#23195
fix(gateway): sync Telegram topic binding after session split#23195litvinovvo wants to merge 2 commits into
Conversation
Keep Telegram DM topic bindings aligned when context compression or agent session switching rotates the active session id, so topic lanes resume the compressed child session instead of the stale parent. Fixes NousResearch#20470
There was a problem hiding this comment.
Pull request overview
This PR fixes a gateway routing bug in Telegram DM topic mode where a topic’s SQLite binding (telegram_dm_topic_bindings) could remain pointed at a pre-compression parent session after the active Hermes session_id rolls over to a compressed child session, causing subsequent messages to resolve back to the stale parent transcript and repeatedly trigger compression.
Changes:
- Add a dedicated helper to safely refresh Telegram DM topic bindings when
session_idchanges, with exception containment and structured logging. - Invoke that refresh helper on all relevant session-id rollover paths: hygiene/preflight compression, agent-result session switches, and run-time compression splits.
- Add a regression test ensuring a Telegram topic binding follows the compressed child session id.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
gateway/run.py |
Introduces a helper to refresh Telegram topic bindings after session switches and calls it from the key session_id rollover paths. |
tests/gateway/test_telegram_topic_mode.py |
Adds a regression test verifying the topic binding updates to the post-compression child session_id. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Additional context for maintainers: I noticed there are already related attempts for #20470 (#20485, #20486, #21171), so I wanted to call out why this PR is intentionally a separate/updated fix rather than just another duplicate. Why I think this PR is a better candidate to review/merge:
The intent here is to encode the broader routing invariant: whenever the active Local validation run: |
|
Follow-up update: this PR now also covers a second Telegram topic-routing failure mode that can make replies appear in the General/private-chat lane. Root cause for this path: when Telegram reports the stored New behavior:
Validation:
|
…hildren (#34409) Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in SQLite so reopening a topic resumes the right Hermes session. When compression rotated session_entry.session_id mid-turn, the binding row stayed pointed at the pre-compression parent. On the next inbound message in that topic the gateway reloaded the oversized parent transcript, retriggering preflight compression — sometimes in a loop. Two-pronged fix: 1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper called immediately after each of the three session_id rotation sites in _handle_message_with_agent (hygiene compression, agent-result compression rotation, /compress command). Keeps future bindings fresh. 2. Read-path self-heal: when resolving an existing topic binding, walk SessionDB.get_compression_tip() forward and switch_session to the descendant instead of the stored parent. Rewrites the binding row to the tip so subsequent messages skip the walk. Heals existing stale state on the next user message without requiring a gateway restart. Skipped from competing PRs as not load-bearing for the bug: - advance_session_after_compression SessionStore primitive (#26204/ #28870/#33416) — preserves end_reason='compression' analytics nicety but doesn't affect routing correctness. - Cached-agent eviction on session_id mismatch — _compress_context() already mutates tmp_agent.session_id on the cached object so the in-memory agent self-corrects. - Startup repair pass (#33416) — redundant once the read path heals on the next message; one-line CLI follow-up can address bindings for topics users never reopen. Closes #20470, #29712, #33414. Acknowledges work in #23195 (@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713 (@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).
|
Thanks for this work — the fix landed via PR #34409 (#34409), merged as commit I reviewed all six PRs targeting #20470 / #29712 / #33414 and synthesized the load-bearing minimum:
Skipped intentionally: the Closing as superseded — your work shaped the final design. Appreciate the thorough analysis. |
…hildren (NousResearch#34409) Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in SQLite so reopening a topic resumes the right Hermes session. When compression rotated session_entry.session_id mid-turn, the binding row stayed pointed at the pre-compression parent. On the next inbound message in that topic the gateway reloaded the oversized parent transcript, retriggering preflight compression — sometimes in a loop. Two-pronged fix: 1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper called immediately after each of the three session_id rotation sites in _handle_message_with_agent (hygiene compression, agent-result compression rotation, /compress command). Keeps future bindings fresh. 2. Read-path self-heal: when resolving an existing topic binding, walk SessionDB.get_compression_tip() forward and switch_session to the descendant instead of the stored parent. Rewrites the binding row to the tip so subsequent messages skip the walk. Heals existing stale state on the next user message without requiring a gateway restart. Skipped from competing PRs as not load-bearing for the bug: - advance_session_after_compression SessionStore primitive (NousResearch#26204/ NousResearch#28870/NousResearch#33416) — preserves end_reason='compression' analytics nicety but doesn't affect routing correctness. - Cached-agent eviction on session_id mismatch — _compress_context() already mutates tmp_agent.session_id on the cached object so the in-memory agent self-corrects. - Startup repair pass (NousResearch#33416) — redundant once the read path heals on the next message; one-line CLI follow-up can address bindings for topics users never reopen. Closes NousResearch#20470, NousResearch#29712, NousResearch#33414. Acknowledges work in NousResearch#23195 (@litvinovvo), NousResearch#26204 (@bizyumov), NousResearch#28870 (@donrhmexe), NousResearch#29713 (@hehehe0803), NousResearch#29945 (@eugeneb1ack), NousResearch#33416 (@bizyumov).
Summary
Problem
When a Telegram DM topic session is compacted, Hermes rotates from the pre-compression parent
session_idto a new childsession_id. The gateway updated the activeSessionEntry, but the separate SQLitetelegram_dm_topic_bindingsrow could remain pointed at the old parent session.On the next inbound message in that topic, topic resolution could switch the lane back to the stale parent transcript, causing repeated preflight compression loops.
Fixes #20470.
Test Plan
/home/pc_lion/.hermes/hermes-agent/venv/bin/python -m ruff check gateway/run.py tests/gateway/test_telegram_topic_mode.pygit diff --check -- gateway/run.py tests/gateway/test_telegram_topic_mode.py/home/pc_lion/.hermes/hermes-agent/venv/bin/python -m pytest tests/gateway/test_telegram_topic_mode.py tests/gateway/test_session_hygiene.py -q -o 'addopts='Notes
There are earlier related PRs (#20485, #20486, #21171), but this version starts from current
main, avoids unrelated file changes, and updates all session-id rollover paths touched by this bug class rather than only the run-time split path.