fix(gateway): keep topic bindings aligned after compression#29713
fix(gateway): keep topic bindings aligned after compression#29713hehehe0803 wants to merge 5 commits into
Conversation
|
Follow-up from local regression triage: this PR is related to a repeated-compaction loop in Telegram DM topic mode. The original fix synced topic bindings after outer agent-result session rotation, but mid-run compression can mutate the gateway session entry before that outer sync runs, so the durable topic binding can remain pointed at the oversized parent session.\n\nThis update syncs the Telegram topic binding immediately in the mid-run session-split path, and if Telegram delivered the event without a lane thread id, it recovers the thread id from the old session binding before rotating it to the compressed child.\n\nLocal verification:\n- ▶ running per-file parallel test suite via run_tests_parallel.py === Summary: 2 files, 117 tests passed, 0 failed (100% complete) in 3.9s (40 workers) === → 117 passed |
ab32a4d to
defcb2e
Compare
defcb2e to
838abd0
Compare
…hildren (#34409) Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in SQLite so reopening a topic resumes the right Hermes session. When compression rotated session_entry.session_id mid-turn, the binding row stayed pointed at the pre-compression parent. On the next inbound message in that topic the gateway reloaded the oversized parent transcript, retriggering preflight compression — sometimes in a loop. Two-pronged fix: 1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper called immediately after each of the three session_id rotation sites in _handle_message_with_agent (hygiene compression, agent-result compression rotation, /compress command). Keeps future bindings fresh. 2. Read-path self-heal: when resolving an existing topic binding, walk SessionDB.get_compression_tip() forward and switch_session to the descendant instead of the stored parent. Rewrites the binding row to the tip so subsequent messages skip the walk. Heals existing stale state on the next user message without requiring a gateway restart. Skipped from competing PRs as not load-bearing for the bug: - advance_session_after_compression SessionStore primitive (#26204/ #28870/#33416) — preserves end_reason='compression' analytics nicety but doesn't affect routing correctness. - Cached-agent eviction on session_id mismatch — _compress_context() already mutates tmp_agent.session_id on the cached object so the in-memory agent self-corrects. - Startup repair pass (#33416) — redundant once the read path heals on the next message; one-line CLI follow-up can address bindings for topics users never reopen. Closes #20470, #29712, #33414. Acknowledges work in #23195 (@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713 (@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).
|
Thanks for this work — the fix landed via PR #34409 (#34409), merged as commit I reviewed all six PRs targeting #20470 / #29712 / #33414 and synthesized the load-bearing minimum:
Skipped intentionally: the Closing as superseded — your work shaped the final design. Appreciate the thorough analysis. |
…hildren (NousResearch#34409) Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in SQLite so reopening a topic resumes the right Hermes session. When compression rotated session_entry.session_id mid-turn, the binding row stayed pointed at the pre-compression parent. On the next inbound message in that topic the gateway reloaded the oversized parent transcript, retriggering preflight compression — sometimes in a loop. Two-pronged fix: 1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper called immediately after each of the three session_id rotation sites in _handle_message_with_agent (hygiene compression, agent-result compression rotation, /compress command). Keeps future bindings fresh. 2. Read-path self-heal: when resolving an existing topic binding, walk SessionDB.get_compression_tip() forward and switch_session to the descendant instead of the stored parent. Rewrites the binding row to the tip so subsequent messages skip the walk. Heals existing stale state on the next user message without requiring a gateway restart. Skipped from competing PRs as not load-bearing for the bug: - advance_session_after_compression SessionStore primitive (NousResearch#26204/ NousResearch#28870/NousResearch#33416) — preserves end_reason='compression' analytics nicety but doesn't affect routing correctness. - Cached-agent eviction on session_id mismatch — _compress_context() already mutates tmp_agent.session_id on the cached object so the in-memory agent self-corrects. - Startup repair pass (NousResearch#33416) — redundant once the read path heals on the next message; one-line CLI follow-up can address bindings for topics users never reopen. Closes NousResearch#20470, NousResearch#29712, NousResearch#33414. Acknowledges work in NousResearch#23195 (@litvinovvo), NousResearch#26204 (@bizyumov), NousResearch#28870 (@donrhmexe), NousResearch#29713 (@hehehe0803), NousResearch#29945 (@eugeneb1ack), NousResearch#33416 (@bizyumov).
Description
Fixes two gateway reliability edge cases observed in generalized gateway operation:
network-online.target, which may not exist or be reachable in a user manager. System-scope units keep the network-online ordering; user-scope units now avoid that dependency.Fixes #29712.
Related to #29421.
Changes
_sync_telegram_topic_binding(...), scoped to Telegram topic lanes.session_entry.session_id;session_entry.session_id.oversized-parent-session -> compressed-child-session.After=network-online.target/Wants=network-online.target, while preserving those dependencies for system units.Test plan
Privacy / OSS hygiene