Skip to content

fix(gateway): keep topic bindings aligned after compression#29713

Closed
hehehe0803 wants to merge 5 commits into
NousResearch:mainfrom
hehehe0803:fix/gateway-topic-compression-binding
Closed

fix(gateway): keep topic bindings aligned after compression#29713
hehehe0803 wants to merge 5 commits into
NousResearch:mainfrom
hehehe0803:fix/gateway-topic-compression-binding

Conversation

@hehehe0803

Copy link
Copy Markdown
Contributor

Description

Fixes two gateway reliability edge cases observed in generalized gateway operation:

  1. Telegram DM topic-mode bindings were not updated after compression/session rotation. A topic could remain bound to the oversized parent session, so the next message in the topic restored the stale parent and could trigger repeated compaction.
  2. User-scope systemd gateway units could wait on network-online.target, which may not exist or be reachable in a user manager. System-scope units keep the network-online ordering; user-scope units now avoid that dependency.

Fixes #29712.
Related to #29421.

Changes

  • Add _sync_telegram_topic_binding(...), scoped to Telegram topic lanes.
  • Sync the topic binding after:
    • hygiene compression changes session_entry.session_id;
    • agent-returned compression/session split changes session_entry.session_id.
  • Add a regression test proving the topic binding follows oversized-parent-session -> compressed-child-session.
  • Generate user systemd units without After=network-online.target / Wants=network-online.target, while preserving those dependencies for system units.

Test plan

python -m py_compile gateway/run.py hermes_cli/gateway.py tests/gateway/test_telegram_topic_mode.py tests/hermes_cli/test_gateway_service.py
scripts/run_tests.sh \
  tests/gateway/test_telegram_topic_mode.py::test_topic_binding_follows_session_id_rotation_after_compression \
  tests/gateway/test_telegram_topic_mode.py::test_managed_topic_binding_reuses_restored_session_over_static_lane_session \
  tests/hermes_cli/test_gateway_service.py::TestGeneratedSystemdUnits::test_user_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout \
  tests/hermes_cli/test_gateway_service.py::TestGeneratedSystemdUnits::test_system_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout \
  -q
# 4 passed

scripts/run_tests.sh tests/gateway/test_telegram_topic_mode.py -q
# 43 passed

Privacy / OSS hygiene

  • Uses synthetic session IDs and test chat/thread IDs only.
  • Does not include real chat IDs, hostnames, local paths, private network addresses, tokens, or account-specific details.

@daimon-nous daimon-nous Bot added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels May 21, 2026
@hehehe0803

Copy link
Copy Markdown
Contributor Author

Follow-up from local regression triage: this PR is related to a repeated-compaction loop in Telegram DM topic mode. The original fix synced topic bindings after outer agent-result session rotation, but mid-run compression can mutate the gateway session entry before that outer sync runs, so the durable topic binding can remain pointed at the oversized parent session.\n\nThis update syncs the Telegram topic binding immediately in the mid-run session-split path, and if Telegram delivered the event without a lane thread id, it recovers the thread id from the old session binding before rotating it to the compressed child.\n\nLocal verification:\n- ▶ running per-file parallel test suite via run_tests_parallel.py
(TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; clean env)
Discovered 2 test files (117 tests) under ['tests/gateway/test_telegram_topic_mode.py', 'tests/gateway/test_session.py']; running with -j 40
[ 63.2% | 74/117 | ✓74 | ✗ 0] ✓ tests/gateway/test_session.py (74✓, 1.5s)
[100.0% | 117/117 | ✓117 | ✗ 0] ✓ tests/gateway/test_telegram_topic_mode.py (43✓, 3.9s)

=== Summary: 2 files, 117 tests passed, 0 failed (100% complete) in 3.9s (40 workers) === → 117 passed

@hehehe0803 hehehe0803 force-pushed the fix/gateway-topic-compression-binding branch from ab32a4d to defcb2e Compare May 24, 2026 06:14
@hehehe0803 hehehe0803 force-pushed the fix/gateway-topic-compression-binding branch from defcb2e to 838abd0 Compare May 26, 2026 12:05
teknium1 added a commit that referenced this pull request May 29, 2026
…hildren (#34409)

Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (#26204/
  #28870/#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes #20470, #29712, #33414. Acknowledges work in #23195
(@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713
(@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for this work — the fix landed via PR #34409 (#34409), merged as commit db96fc60d.

I reviewed all six PRs targeting #20470 / #29712 / #33414 and synthesized the load-bearing minimum:

  1. Your _sync_telegram_topic_binding helper pattern after each session_id rotation site (Family A, this PR's most direct ancestor).
  2. The compression-tip read-path self-heal from @bizyumov's PRs (Family B), so existing already-stale bindings recover on the next message without a restart.

Skipped intentionally: the advance_session_after_compression SessionStore primitive (analytics nicety), explicit cached-agent eviction (_compress_context() already mutates tmp_agent.session_id on the cached object), and the startup repair pass (redundant once the read path self-heals).

Closing as superseded — your work shaped the final design. Appreciate the thorough analysis.

@teknium1 teknium1 closed this May 29, 2026
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
…hildren (NousResearch#34409)

Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (NousResearch#26204/
  NousResearch#28870/NousResearch#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (NousResearch#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes NousResearch#20470, NousResearch#29712, NousResearch#33414. Acknowledges work in NousResearch#23195
(@litvinovvo), NousResearch#26204 (@bizyumov), NousResearch#28870 (@donrhmexe), NousResearch#29713
(@hehehe0803), NousResearch#29945 (@eugeneb1ack), NousResearch#33416 (@bizyumov).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Telegram topic bindings can point at pre-compression sessions

2 participants