Describe the bug
In Telegram DM topic mode, _recover_telegram_topic_thread_id in gateway/run.py rewrites the inbound thread_id of every brand-new topic to the user's most-recently-bound topic, hijacking the new conversation into the previous lane. The hijack is self-reinforcing: because the rewrite happens before _record_telegram_topic_binding, the new topic's binding row is never written, so the next inbound also looks "unknown" and is hijacked again. The freshly-created topic never recovers on its own.
User-visible symptoms:
- "I type in topic X, but the reply appears in topic Y."
- "I sent several messages and got no reply at all" (the agent is busy in / interrupted on the wrong lane, or has compressed-rolled-back state).
- The first message in any new topic is dropped into whatever topic was last active.
Reproduction Steps
- Enable Telegram DM topic mode and use it long enough to accumulate at least one topic binding.
- From "All Messages" (or by clicking "New Chat" in the topic strip), open a brand-new topic and send a message in it.
- Expected: the agent replies in the new topic and a binding row is written for the new
thread_id.
- Actual: gateway logs
telegram topic recovery: chat=... user=... '<new_thread>' -> <last_active_thread>, the message is processed against the previous topic's session, and the reply appears in the previous topic. The new topic has no binding row.
- Subsequent messages in the new topic continue to be hijacked.
Root cause
gateway/run.py _recover_telegram_topic_thread_id (introduced in commit ede47a54b, "fix(gateway): pin Telegram DM-topic routing to user's current topic"):
inbound = str(source.thread_id or "")
is_lobby = not inbound or inbound in self._TELEGRAM_GENERAL_TOPIC_IDS
known = {str(b.get("thread_id") or "") for b in bindings}
if not is_lobby and inbound in known:
return None
# ... falls through and rewrites to the user's most-recent binding
The commit was intended to address two real Telegram quirks: (a) "Reply on a message in another topic" leaks the other topic's message_thread_id, and (b) _build_message_event strips thread_id on plain replies (#3206 — required for non-topic users). Both of those legitimately produce wrong/missing thread_id values and should be recovered.
The bug is that the function does not distinguish between:
- A cross-topic-Reply leak (rare — needs the user to long-press-reply onto a message in another topic), and
- A brand-new topic the user just opened (common — every "New Chat" creates one).
Both look identical to the function: an explicit, non-lobby thread_id that isn't yet in telegram_dm_topic_bindings. The "unknown topic → snap to most-recent" arm treats every fresh topic as a leak.
The trap closes on itself: _recover_telegram_topic_thread_id runs before _record_telegram_topic_binding, so the hijacked thread_id is what gets bound. The original new topic never gets a binding row, so the next message also matches the "unknown" arm, and so on.
Evidence from a live instance
Every message in a freshly-opened topic this morning got rewritten to the previously-active topic:
gateway.log:
07:27:20 telegram topic recovery: chat=313975948 user=313975948 '573' -> 563
07:50:45 telegram topic recovery: chat=313975948 user=313975948 '573' -> 563
08:01:42 telegram topic recovery: chat=313975948 user=313975948 '573' -> 563
... (every message from topic 573 redirected to 563 for hours) ...
10:45:17 telegram topic recovery: chat=313975948 user=313975948 '652' -> 563 # new topic, same hijack
State divergence after the hijack ran for hours:
sessions.json : agent:main:telegram:dm:313975948:573 → 20260523_073347_9a3c4a07 (created on first send, then orphaned)
state.db bindings : (no row for thread_id=573)
state.db bindings : thread=563 → 20260523_093515_852a1b # all of topic 573's messages landed here
Topic 573 had a SessionStore JSON entry from the moment its first message arrived (07:33:47), but its telegram_dm_topic_bindings row was never written because every message in that topic was rerouted to 563 in _build_message_event post-processing before binding creation. Session 20260523_073347_9a3c4a07 for topic 573 has message_count = 0 despite the user sending many messages into that topic.
Proposed fix
Drop the "unknown topic → snap back" arm. An explicit, non-lobby thread_id must be trusted as-is. The legitimate cross-topic-Reply leak case (the rarer of the two) self-corrects on the next message the user sends in the right topic — a one-message inconvenience is much cheaper than permanently trapping every new topic.
inbound = str(source.thread_id or "")
is_lobby = not inbound or inbound in self._TELEGRAM_GENERAL_TOPIC_IDS
if not is_lobby:
# Only rewrite when the inbound id is missing/lobby. An explicit,
# non-lobby thread_id must be trusted as-is even when it isn't in
# our bindings table — a brand-new topic the user just created has
# no binding row yet, and rewriting it to the most-recent topic
# traps every fresh topic against the previous one.
return None
user_id = str(source.user_id)
for b in bindings: # newest-first
if str(b.get("user_id") or "") == user_id:
recovered = str(b.get("thread_id") or "")
if recovered and recovered != inbound:
return recovered
return None
return None
This preserves the original genuine win — snap a stripped/lobby thread_id back to the user's current topic — and removes the over-corrective arm.
Local repair for already-affected installs
For topics that already have a SessionStore entry but no SQLite binding row (i.e., the hijack trapped them):
INSERT INTO telegram_dm_topic_bindings
(chat_id, thread_id, user_id, session_key, session_id, managed_mode, linked_at, updated_at)
VALUES
(?, ?, ?, 'agent:main:telegram:dm:?:?', ?, 'auto', strftime('%s','now'), strftime('%s','now'));
Where session_id is the entry already present in ~/.hermes/sessions/sessions.json for that topic key.
Verification
I patched the function locally as above and updated the relevant regression test in tests/gateway/test_telegram_topic_mode.py (the previously-passing test_recover_rewrites_unknown_thread_id_to_most_recent test encoded the buggy behaviour; it was renamed to test_recover_leaves_unknown_explicit_thread_id_alone and inverted). All other tests in the file still pass:
$ python -m pytest tests/gateway/test_telegram_topic_mode.py -q
44 passed
After the gateway restart, I verified live with two test messages:
- A message in the previously-trapped topic 573: no
telegram topic recovery log line; reply landed in topic 573 correctly.
- A message creating a brand-new topic: no recovery line; reply landed in the new topic; SQLite binding row written for the new
thread_id.
Relationship to #20470
This is a distinct bug from #20470. #20470 is about the durable binding row not being refreshed after a compression-induced session split (post-split state divergence). This one is about the binding row never being written in the first place because the inbound thread_id is rewritten before the binding code runs (pre-binding hijack). Both affect Telegram DM topic mode, but they fire in different parts of the inbound path and produce different user-visible symptoms.
Environment
- Hermes commit: HEAD
729a778af (local), bug also present on origin/main 7245bc77e.
- macOS, Telegram DM topic mode enabled.
- Bug introduced by commit
ede47a54b (2026-05-15, "fix(gateway): pin Telegram DM-topic routing to user's current topic").
Happy to open a PR with the patch + updated test if that helps.
Describe the bug
In Telegram DM topic mode,
_recover_telegram_topic_thread_idingateway/run.pyrewrites the inboundthread_idof every brand-new topic to the user's most-recently-bound topic, hijacking the new conversation into the previous lane. The hijack is self-reinforcing: because the rewrite happens before_record_telegram_topic_binding, the new topic's binding row is never written, so the next inbound also looks "unknown" and is hijacked again. The freshly-created topic never recovers on its own.User-visible symptoms:
Reproduction Steps
thread_id.telegram topic recovery: chat=... user=... '<new_thread>' -> <last_active_thread>, the message is processed against the previous topic's session, and the reply appears in the previous topic. The new topic has no binding row.Root cause
gateway/run.py_recover_telegram_topic_thread_id(introduced in commitede47a54b, "fix(gateway): pin Telegram DM-topic routing to user's current topic"):The commit was intended to address two real Telegram quirks: (a) "Reply on a message in another topic" leaks the other topic's
message_thread_id, and (b)_build_message_eventstripsthread_idon plain replies (#3206 — required for non-topic users). Both of those legitimately produce wrong/missingthread_idvalues and should be recovered.The bug is that the function does not distinguish between:
Both look identical to the function: an explicit, non-lobby
thread_idthat isn't yet intelegram_dm_topic_bindings. The "unknown topic → snap to most-recent" arm treats every fresh topic as a leak.The trap closes on itself:
_recover_telegram_topic_thread_idruns before_record_telegram_topic_binding, so the hijacked thread_id is what gets bound. The original new topic never gets a binding row, so the next message also matches the "unknown" arm, and so on.Evidence from a live instance
Every message in a freshly-opened topic this morning got rewritten to the previously-active topic:
State divergence after the hijack ran for hours:
Topic 573 had a SessionStore JSON entry from the moment its first message arrived (07:33:47), but its
telegram_dm_topic_bindingsrow was never written because every message in that topic was rerouted to 563 in_build_message_eventpost-processing before binding creation. Session20260523_073347_9a3c4a07for topic 573 hasmessage_count = 0despite the user sending many messages into that topic.Proposed fix
Drop the "unknown topic → snap back" arm. An explicit, non-lobby
thread_idmust be trusted as-is. The legitimate cross-topic-Reply leak case (the rarer of the two) self-corrects on the next message the user sends in the right topic — a one-message inconvenience is much cheaper than permanently trapping every new topic.This preserves the original genuine win — snap a stripped/lobby thread_id back to the user's current topic — and removes the over-corrective arm.
Local repair for already-affected installs
For topics that already have a SessionStore entry but no SQLite binding row (i.e., the hijack trapped them):
Where
session_idis the entry already present in~/.hermes/sessions/sessions.jsonfor that topic key.Verification
I patched the function locally as above and updated the relevant regression test in
tests/gateway/test_telegram_topic_mode.py(the previously-passingtest_recover_rewrites_unknown_thread_id_to_most_recenttest encoded the buggy behaviour; it was renamed totest_recover_leaves_unknown_explicit_thread_id_aloneand inverted). All other tests in the file still pass:After the gateway restart, I verified live with two test messages:
telegram topic recoverylog line; reply landed in topic 573 correctly.thread_id.Relationship to #20470
This is a distinct bug from #20470. #20470 is about the durable binding row not being refreshed after a compression-induced session split (post-split state divergence). This one is about the binding row never being written in the first place because the inbound
thread_idis rewritten before the binding code runs (pre-binding hijack). Both affect Telegram DM topic mode, but they fire in different parts of the inbound path and produce different user-visible symptoms.Environment
729a778af(local), bug also present onorigin/main7245bc77e.ede47a54b(2026-05-15, "fix(gateway): pin Telegram DM-topic routing to user's current topic").Happy to open a PR with the patch + updated test if that helps.