Skip to content

[Bug] _recover_telegram_topic_thread_id hijacks every brand-new Telegram DM topic into the previous topic #31086

@dillweed

Description

@dillweed

Describe the bug

In Telegram DM topic mode, _recover_telegram_topic_thread_id in gateway/run.py rewrites the inbound thread_id of every brand-new topic to the user's most-recently-bound topic, hijacking the new conversation into the previous lane. The hijack is self-reinforcing: because the rewrite happens before _record_telegram_topic_binding, the new topic's binding row is never written, so the next inbound also looks "unknown" and is hijacked again. The freshly-created topic never recovers on its own.

User-visible symptoms:

  • "I type in topic X, but the reply appears in topic Y."
  • "I sent several messages and got no reply at all" (the agent is busy in / interrupted on the wrong lane, or has compressed-rolled-back state).
  • The first message in any new topic is dropped into whatever topic was last active.

Reproduction Steps

  1. Enable Telegram DM topic mode and use it long enough to accumulate at least one topic binding.
  2. From "All Messages" (or by clicking "New Chat" in the topic strip), open a brand-new topic and send a message in it.
  3. Expected: the agent replies in the new topic and a binding row is written for the new thread_id.
  4. Actual: gateway logs telegram topic recovery: chat=... user=... '<new_thread>' -> <last_active_thread>, the message is processed against the previous topic's session, and the reply appears in the previous topic. The new topic has no binding row.
  5. Subsequent messages in the new topic continue to be hijacked.

Root cause

gateway/run.py _recover_telegram_topic_thread_id (introduced in commit ede47a54b, "fix(gateway): pin Telegram DM-topic routing to user's current topic"):

inbound = str(source.thread_id or "")
is_lobby = not inbound or inbound in self._TELEGRAM_GENERAL_TOPIC_IDS
known = {str(b.get("thread_id") or "") for b in bindings}
if not is_lobby and inbound in known:
    return None
# ... falls through and rewrites to the user's most-recent binding

The commit was intended to address two real Telegram quirks: (a) "Reply on a message in another topic" leaks the other topic's message_thread_id, and (b) _build_message_event strips thread_id on plain replies (#3206 — required for non-topic users). Both of those legitimately produce wrong/missing thread_id values and should be recovered.

The bug is that the function does not distinguish between:

  • A cross-topic-Reply leak (rare — needs the user to long-press-reply onto a message in another topic), and
  • A brand-new topic the user just opened (common — every "New Chat" creates one).

Both look identical to the function: an explicit, non-lobby thread_id that isn't yet in telegram_dm_topic_bindings. The "unknown topic → snap to most-recent" arm treats every fresh topic as a leak.

The trap closes on itself: _recover_telegram_topic_thread_id runs before _record_telegram_topic_binding, so the hijacked thread_id is what gets bound. The original new topic never gets a binding row, so the next message also matches the "unknown" arm, and so on.

Evidence from a live instance

Every message in a freshly-opened topic this morning got rewritten to the previously-active topic:

gateway.log:
07:27:20  telegram topic recovery: chat=313975948 user=313975948 '573' -> 563
07:50:45  telegram topic recovery: chat=313975948 user=313975948 '573' -> 563
08:01:42  telegram topic recovery: chat=313975948 user=313975948 '573' -> 563
... (every message from topic 573 redirected to 563 for hours) ...
10:45:17  telegram topic recovery: chat=313975948 user=313975948 '652' -> 563   # new topic, same hijack

State divergence after the hijack ran for hours:

sessions.json     : agent:main:telegram:dm:313975948:573 → 20260523_073347_9a3c4a07   (created on first send, then orphaned)
state.db bindings : (no row for thread_id=573)
state.db bindings : thread=563 → 20260523_093515_852a1b                              # all of topic 573's messages landed here

Topic 573 had a SessionStore JSON entry from the moment its first message arrived (07:33:47), but its telegram_dm_topic_bindings row was never written because every message in that topic was rerouted to 563 in _build_message_event post-processing before binding creation. Session 20260523_073347_9a3c4a07 for topic 573 has message_count = 0 despite the user sending many messages into that topic.

Proposed fix

Drop the "unknown topic → snap back" arm. An explicit, non-lobby thread_id must be trusted as-is. The legitimate cross-topic-Reply leak case (the rarer of the two) self-corrects on the next message the user sends in the right topic — a one-message inconvenience is much cheaper than permanently trapping every new topic.

inbound = str(source.thread_id or "")
is_lobby = not inbound or inbound in self._TELEGRAM_GENERAL_TOPIC_IDS
if not is_lobby:
    # Only rewrite when the inbound id is missing/lobby. An explicit,
    # non-lobby thread_id must be trusted as-is even when it isn't in
    # our bindings table — a brand-new topic the user just created has
    # no binding row yet, and rewriting it to the most-recent topic
    # traps every fresh topic against the previous one.
    return None
user_id = str(source.user_id)
for b in bindings:  # newest-first
    if str(b.get("user_id") or "") == user_id:
        recovered = str(b.get("thread_id") or "")
        if recovered and recovered != inbound:
            return recovered
        return None
return None

This preserves the original genuine win — snap a stripped/lobby thread_id back to the user's current topic — and removes the over-corrective arm.

Local repair for already-affected installs

For topics that already have a SessionStore entry but no SQLite binding row (i.e., the hijack trapped them):

INSERT INTO telegram_dm_topic_bindings
  (chat_id, thread_id, user_id, session_key, session_id, managed_mode, linked_at, updated_at)
VALUES
  (?, ?, ?, 'agent:main:telegram:dm:?:?', ?, 'auto', strftime('%s','now'), strftime('%s','now'));

Where session_id is the entry already present in ~/.hermes/sessions/sessions.json for that topic key.

Verification

I patched the function locally as above and updated the relevant regression test in tests/gateway/test_telegram_topic_mode.py (the previously-passing test_recover_rewrites_unknown_thread_id_to_most_recent test encoded the buggy behaviour; it was renamed to test_recover_leaves_unknown_explicit_thread_id_alone and inverted). All other tests in the file still pass:

$ python -m pytest tests/gateway/test_telegram_topic_mode.py -q
44 passed

After the gateway restart, I verified live with two test messages:

  • A message in the previously-trapped topic 573: no telegram topic recovery log line; reply landed in topic 573 correctly.
  • A message creating a brand-new topic: no recovery line; reply landed in the new topic; SQLite binding row written for the new thread_id.

Relationship to #20470

This is a distinct bug from #20470. #20470 is about the durable binding row not being refreshed after a compression-induced session split (post-split state divergence). This one is about the binding row never being written in the first place because the inbound thread_id is rewritten before the binding code runs (pre-binding hijack). Both affect Telegram DM topic mode, but they fire in different parts of the inbound path and produce different user-visible symptoms.

Environment

  • Hermes commit: HEAD 729a778af (local), bug also present on origin/main 7245bc77e.
  • macOS, Telegram DM topic mode enabled.
  • Bug introduced by commit ede47a54b (2026-05-15, "fix(gateway): pin Telegram DM-topic routing to user's current topic").

Happy to open a PR with the patch + updated test if that helps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/gatewayGateway runner, session dispatch, deliveryplatform/telegramTelegram bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions