Skip to content

fix(gateway): sync Telegram topic binding after session split#23195

Closed
litvinovvo wants to merge 2 commits into
NousResearch:mainfrom
litvinovvo:fix/telegram-topic-binding-compression-split
Closed

fix(gateway): sync Telegram topic binding after session split#23195
litvinovvo wants to merge 2 commits into
NousResearch:mainfrom
litvinovvo:fix/telegram-topic-binding-compression-split

Conversation

@litvinovvo

Copy link
Copy Markdown

Summary

  • Keep Telegram DM topic bindings in sync whenever the active gateway session id changes.
  • Cover context-compression session splits in hygiene/preflight, run-time compression, and agent result session-switch paths.
  • Add a regression test that verifies a topic binding follows the compressed child session id.

Problem

When a Telegram DM topic session is compacted, Hermes rotates from the pre-compression parent session_id to a new child session_id. The gateway updated the active SessionEntry, but the separate SQLite telegram_dm_topic_bindings row could remain pointed at the old parent session.

On the next inbound message in that topic, topic resolution could switch the lane back to the stale parent transcript, causing repeated preflight compression loops.

Fixes #20470.

Test Plan

  • /home/pc_lion/.hermes/hermes-agent/venv/bin/python -m ruff check gateway/run.py tests/gateway/test_telegram_topic_mode.py
  • git diff --check -- gateway/run.py tests/gateway/test_telegram_topic_mode.py
  • /home/pc_lion/.hermes/hermes-agent/venv/bin/python -m pytest tests/gateway/test_telegram_topic_mode.py tests/gateway/test_session_hygiene.py -q -o 'addopts='

Notes

There are earlier related PRs (#20485, #20486, #21171), but this version starts from current main, avoids unrelated file changes, and updates all session-id rollover paths touched by this bug class rather than only the run-time split path.

Keep Telegram DM topic bindings aligned when context compression or agent session switching rotates the active session id, so topic lanes resume the compressed child session instead of the stale parent.

Fixes NousResearch#20470
Copilot AI review requested due to automatic review settings May 10, 2026 13:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a gateway routing bug in Telegram DM topic mode where a topic’s SQLite binding (telegram_dm_topic_bindings) could remain pointed at a pre-compression parent session after the active Hermes session_id rolls over to a compressed child session, causing subsequent messages to resolve back to the stale parent transcript and repeatedly trigger compression.

Changes:

  • Add a dedicated helper to safely refresh Telegram DM topic bindings when session_id changes, with exception containment and structured logging.
  • Invoke that refresh helper on all relevant session-id rollover paths: hygiene/preflight compression, agent-result session switches, and run-time compression splits.
  • Add a regression test ensuring a Telegram topic binding follows the compressed child session id.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
gateway/run.py Introduces a helper to refresh Telegram topic bindings after session switches and calls it from the key session_id rollover paths.
tests/gateway/test_telegram_topic_mode.py Adds a regression test verifying the topic binding updates to the post-compression child session_id.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@litvinovvo

Copy link
Copy Markdown
Author

Additional context for maintainers: I noticed there are already related attempts for #20470 (#20485, #20486, #21171), so I wanted to call out why this PR is intentionally a separate/updated fix rather than just another duplicate.

Why I think this PR is a better candidate to review/merge:

  • It is based on current main and has a clean, narrow scope: only gateway/run.py and tests/gateway/test_telegram_topic_mode.py are changed.
  • It includes a regression test for the actual invariant: after a compression/session split, the Telegram topic binding must point at the new child session_id rather than the stale parent.
  • It handles every session-id rollover path I found in the gateway topic lane, not just the run-time compression path:
    • hygiene/preflight compression split;
    • agent-result session switch after _handle_message_with_agent;
    • run-time compression split inside _run_agent.
  • fix: refresh Telegram DM topic binding after session split (#20470) #20485 is still open but only patches the _run_agent split path and has no regression test.
  • fix(telegram): refresh topic binding after session split #20486 was closer because it had a helper and a test, but it is closed and only covered the compression split helper path, not the other gateway session-switch paths above.
  • fix: refresh Telegram DM topic binding after session split (#20470) #21171 is closed and mixed the fix with unrelated skill/account-usage changes, which makes it harder to review as a targeted bug fix.

The intent here is to encode the broader routing invariant: whenever the active session_id changes for a Telegram DM topic lane, the persisted (chat_id, thread_id) -> session_id binding must follow that new active session. Otherwise the next inbound topic message can resolve back to the pre-compression parent and re-trigger the compression loop.

Local validation run:

python -m ruff check gateway/run.py tests/gateway/test_telegram_topic_mode.py
git diff --check -- gateway/run.py tests/gateway/test_telegram_topic_mode.py
python -m pytest tests/gateway/test_telegram_topic_mode.py tests/gateway/test_session_hygiene.py -q -o 'addopts='
# 53 passed

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels May 10, 2026
@litvinovvo

Copy link
Copy Markdown
Author

Follow-up update: this PR now also covers a second Telegram topic-routing failure mode that can make replies appear in the General/private-chat lane.

Root cause for this path: when Telegram reports the stored reply_to_message_id is stale (Message to be replied not found), the previous fallback dropped both the reply anchor and the topic routing metadata. In private-topic mode, dropping message_thread_id silently routes the message to General.

New behavior:

  • If the reply anchor is stale, retry without reply_to_message_id but keep message_thread_id.
  • If a private-topic message_thread_id itself is invalid, fail/log instead of silently sending to General.
  • Text and media paths now share this invariant.

Validation:

  • ruff check gateway/run.py gateway/platforms/telegram.py tests/gateway/test_telegram_topic_mode.py tests/gateway/test_telegram_thread_fallback.py
  • pytest tests/gateway/test_telegram_topic_mode.py::test_topic_binding_follows_session_id_after_compression_split tests/gateway/test_telegram_thread_fallback.py -q -o 'addopts='
  • Result: 35 passed.

teknium1 added a commit that referenced this pull request May 29, 2026
…hildren (#34409)

Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (#26204/
  #28870/#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes #20470, #29712, #33414. Acknowledges work in #23195
(@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713
(@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for this work — the fix landed via PR #34409 (#34409), merged as commit db96fc60d.

I reviewed all six PRs targeting #20470 / #29712 / #33414 and synthesized the load-bearing minimum:

  1. Your _sync_telegram_topic_binding helper pattern after each session_id rotation site (Family A, this PR's most direct ancestor).
  2. The compression-tip read-path self-heal from @bizyumov's PRs (Family B), so existing already-stale bindings recover on the next message without a restart.

Skipped intentionally: the advance_session_after_compression SessionStore primitive (analytics nicety), explicit cached-agent eviction (_compress_context() already mutates tmp_agent.session_id on the cached object), and the startup repair pass (redundant once the read path self-heals).

Closing as superseded — your work shaped the final design. Appreciate the thorough analysis.

@teknium1 teknium1 closed this May 29, 2026
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
…hildren (NousResearch#34409)

Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (NousResearch#26204/
  NousResearch#28870/NousResearch#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (NousResearch#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes NousResearch#20470, NousResearch#29712, NousResearch#33414. Acknowledges work in NousResearch#23195
(@litvinovvo), NousResearch#26204 (@bizyumov), NousResearch#28870 (@donrhmexe), NousResearch#29713
(@hehehe0803), NousResearch#29945 (@eugeneb1ack), NousResearch#33416 (@bizyumov).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Telegram DM topic binding not refreshed after compression-induced session split — causes preflight compression loop

4 participants