feat(telegram): native draft streaming via sendMessageDraft (Bot API 9.5+) (salvage of #3412) by teknium1 · Pull Request #23512 · NousResearch/hermes-agent

teknium1 · 2026-05-11T02:14:17Z

Summary

Telegram DMs now stream the agent's reply token-by-token using Telegram's native sendMessageDraft API (Bot API 9.5, March 2026). Smooth animated previews replace the per-second editMessageText polling that made slow models (o1, deepseek-r1) feel sluggish in the bot UI.

Default streaming transport flips edit → auto. Groups, supergroups, forum topics, and every non-Telegram platform continue using the edit-based path with no behaviour change.

How it works

DM stream begins  → sendMessageDraft(draft_id=N, text="")    starts the bubble
deltas arrive     → sendMessageDraft(draft_id=N, text=...)   animates the preview
turn ends         → sendMessage(text=final)                  draft clears naturally
                                                             user sees the real answer
tool boundary     → finalize current text as a real send +   no inter-tool leak
                    bump draft_id for the next text segment
group / topic     → adapter.supports_draft_streaming = False → edit-based path
draft frame fails → disable drafts for the rest of the run  → edit-based fallback

Changes

Area	What
`gateway/platforms/base.py`	New adapter contract: `supports_draft_streaming(chat_type, metadata)` and `send_draft(...)`. Default impls return False / NotImplementedError so existing adapters are unaffected.
`gateway/platforms/telegram.py`	Telegram override: gates drafts on DM chat type AND PTB 22.6+ capability, routes `send_draft` to PTB's `Bot.send_message_draft` with UTF-16 length trimming, returns `SendResult(success=False, error=...)` on any failure for the consumer to handle.
`gateway/stream_consumer.py`	`StreamConsumerConfig.transport` (`auto\|draft\|edit\|off`) and `chat_type`. Resolves `_use_draft_streaming` once per run, allocates a class-wide monotonic `draft_id`, and routes mid-stream frames through `_send_draft_frame` instead of `edit_message`. Per-response fallback on any draft failure. Tool-boundary `_reset_segment_state` bumps `draft_id` for the next segment. Drafts intentionally do NOT set `_already_sent` so the gateway's final-send path still fires (drafts have no `message_id`).
`gateway/config.py`	`StreamingConfig.transport` default `edit` → `auto`. Documented all four options.
`gateway/run.py`	Both `GatewayStreamConsumer` construction sites now pass `transport` and `chat_type` through.
`tests/gateway/test_stream_consumer_draft.py`	11 new tests covering transport selection, happy path, group fallback, draft-failure fallback, draft_id lifecycle (per-response distinct, tool-boundary bump), and the `_already_sent` invariant.
`website/docs/user-guide/messaging/telegram.md`	New `Streaming transport` section explaining the four options, the DM-only constraint, and per-response fallback behaviour.

Salvage notes

PR #3412 by @NivOO5 was branched against a March 27 base. Current stream_consumer.py is ~4× larger and has a different state machine (_NEW_SEGMENT, _COMMENTARY, finalize, _on_new_message, fresh-final logic). A direct cherry-pick was infeasible.

This PR re-authors the feature against current main with @NivOO5's substantive commit attribution preserved (re-attributed from Alfred <alfred@Alfreds-Mac-mini.local>, the contributor's local hostname-derived git config). The design call is faithful to the original proposal — DM-only, edit-fallback, transport=auto|draft|edit|off — with two improvements baked in based on the openclaw rollout experience:

Per-response monotonic draft_id counter instead of a time.monotonic()*1e6 hash. The original PR's hash had a small but real collision risk across consecutive responses to the same chat. The counter is collision-free and easier to reason about in tests.
Tool-boundary draft_id bump. When the agent does text → tool → text, naive draft handling produces visible bleed-through across tool calls — openclaw documented this as their issue fix(gateway): add .md to MEDIA: path extension allowlist #32535 after they shipped. Bumping draft_id on segment break makes each text segment animate as its own preview below the tool-progress bubble. The pre-tool text is finalized as a real sendMessage (existing first-send path); the post-tool text starts a fresh draft.

The exception-catch in the helper is also tightened — the original PR caught bare Exception inside the helper; the salvage uses the consumer's _draft_failures counter and disables drafts for the rest of the run on any non-success result, with debug-level logging.

Validation

tests/gateway/test_stream_consumer* + test_telegram_thread_fallback.py + test_dm_topics.py + test_telegram_approval_buttons.py + test_telegram_model_picker.py: 190/190 pass.
Full tests/gateway/ (excluding 4 pre-existing failures unrelated to this PR — test_tts_media_routing.py and test_update_streaming.py::test_recognized_slash_command_bypasses_pending_update_prompt, all confirmed present on origin/main without this PR): 5203/5203 pass.

Compatibility

Default transport auto is backwards-compatible. It routes to draft only when the adapter declares support for the specific chat type — Telegram only declares support for DMs. Every other chat type (groups, supergroups, forum topics) and every other platform continues to use the edit-based path.
PTB 22.6+ is required for drafts. Older installs cause supports_draft_streaming to return False, falling back to edit. No deployment hard requirement to upgrade.
No new env vars. Configuration lives in ~/.hermes/config.yaml under gateway.streaming.transport.

Closes

feat(telegram): streaming reply via sendMessageDraft (Bot API 9.5+) #21439 (duplicate feature request for sendMessageDraft support)

Inspired by PR #3412 by @NivOO5; re-authored against current main with attribution preserved on the substantive commit.

@NivOO5

…9.5+) Adds Telegram's native streaming-draft API as a streaming transport so DM replies render with smooth animated previews as tokens arrive, dropping the per-edit jitter of the legacy editMessageText polling path. Adapter contract (gateway/platforms/base.py): - supports_draft_streaming(chat_type, metadata) -> bool. Default False. Telegram returns True only for DMs and only when the bound python- telegram-bot version exposes Bot.send_message_draft (PTB 22.6+). - send_draft(chat_id, draft_id, content, metadata) -> SendResult. Default raises NotImplementedError. Telegram delegates to PTB's send_message_draft. Drafts have no message_id (Bot API contract); SendResult.message_id is None on success. Telegram adapter (gateway/platforms/telegram.py): - supports_draft_streaming gates on chat_type='dm' AND PTB capability. - send_draft trims to MAX_MESSAGE_LENGTH using utf16_len, threads message_thread_id through metadata, and routes failures back as SendResult(success=False, error=...) so the consumer can fall back. Stream consumer (gateway/stream_consumer.py): - StreamConsumerConfig gains transport ('auto'|'draft'|'edit'|'off') and chat_type fields. - run() resolves _use_draft_streaming once via a probe at the top of the run, allocating a fresh class-wide draft_id_counter so each response animates as its own preview (no animation collision across consecutive responses to the same chat). - _send_or_edit gains a pre-edit branch: when drafts are active AND not finalizing AND no edit-path message_id is established, the frame routes through _send_draft_frame instead of edit_message. Drafts intentionally do NOT set _already_sent so the gateway's final sendMessage path still fires — drafts have no message_id and the user needs a real message in their chat history. - _reset_segment_state bumps the draft_id when the consumer is in draft mode so each text block after a tool boundary animates as a fresh preview below the tool-progress bubble (avoids the inter- tool-call leak openclaw documented in their #32535). - Per-response fallback: any send_draft failure (transient network, server reject, capability gap) flips _use_draft_streaming to False for the rest of the run, gracefully returning to the edit path. Gateway config (gateway/config.py): - StreamingConfig.transport default flips edit -> auto. The auto path is identical to edit on every chat type that doesn't currently support drafts (groups, supergroups, forum topics, every non- Telegram platform), so the default is backwards-compatible for non-DM users. Lifecycle model (Telegram Bot API 9.5): 1. sendMessageDraft(chat_id, draft_id, text='') opens the bubble. 2. Repeated sendMessageDraft calls with the SAME draft_id animate the preview as text grows. 3. Drafts have no message_id and cannot be edited or deleted. 4. When the response finishes the gateway's normal sendMessage path delivers the final answer; the draft preview clears naturally on the client and the user sees a real message in their history. Inspired by PR #3412 by @NivOO5. Re-authored against current main (stream_consumer.py is now ~4x larger than at #3412's branch base, with new _NEW_SEGMENT/_COMMENTARY/finalize/_on_new_message machinery the original PR didn't account for) but the design call (DM-only, edit- fallback, transport=auto|draft|edit|off) is faithful to the original proposal, with two improvements baked in: 1. Per-response draft_id (monotonic counter, not a time hash) — no collision risk across consecutive responses on the same chat. 2. Tool-boundary draft_id bump — prevents the inter-tool-call leak openclaw hit during their rollout (their #32535). Closes #21439 (duplicate feature request).

Added tests/gateway/test_stream_consumer_draft.py with 11 tests covering: - Transport selection: auto+dm-supported -> draft; auto+group -> edit; explicit edit; explicit draft on unsupported adapter -> edit; MagicMock adapter -> edit (back-compat for the existing test suite). - Happy path: DM stream animates draft frames with a single shared draft_id, then finalizes via a regular adapter.send. - Group fallback: drafts entirely skipped in non-DM chats. - Failure fallback: send_draft returning success=False disables drafts for the rest of the response. - Draft_id lifecycle: consecutive responses use distinct ids; tool boundaries bump the id so post-tool text animates fresh below the tool-progress bubble (the openclaw #32535 leak guard). - _already_sent contract: drafts must NOT set the flag so the gateway's fallback final-send still fires (drafts have no message_id). Updated website/docs/user-guide/messaging/telegram.md with a 'Streaming transport' section explaining auto|draft|edit|off, the DM-only constraint, and the per-response fallback behaviour.

github-actions · 2026-05-11T02:15:21Z

🔎 Lint report: `salvage/pr-3412` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8124 on HEAD, 8123 on base (🆕 +1)

🆕 New issues (1):

Rule	Count
`unresolved-import`	1

First entries

tests/gateway/test_stream_consumer_draft.py:19: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 4268 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@kjames2001

…uncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw #32535 inter-tool-leak guard. Composes with what just landed: - PR #23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR #23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR #19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes #19537.

@kjames2001

…uncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw #32535 inter-tool-leak guard. Composes with what just landed: - PR #23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR #23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR #19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes #19537.

@wilsen0

…eplies Re-authored against current main from PR #10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes #10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>

@kjames2001

…uncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw NousResearch#32535 inter-tool-leak guard. Composes with what just landed: - PR NousResearch#23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR NousResearch#23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR NousResearch#19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes NousResearch#19537.

@wilsen0

…eplies Re-authored against current main from PR NousResearch#10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (NousResearch#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (NousResearch#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes NousResearch#10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>

@kjames2001

…uncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw NousResearch#32535 inter-tool-leak guard. Composes with what just landed: - PR NousResearch#23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR NousResearch#23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR NousResearch#19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes NousResearch#19537.

@wilsen0

…eplies Re-authored against current main from PR NousResearch#10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (NousResearch#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (NousResearch#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes NousResearch#10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>

@wilsen0

…eplies Re-authored against current main from PR NousResearch#10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (NousResearch#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (NousResearch#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes NousResearch#10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com> (cherry picked from commit ac95b8c)

@kjames2001

…uncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw NousResearch#32535 inter-tool-leak guard. Composes with what just landed: - PR NousResearch#23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR NousResearch#23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR NousResearch#19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes NousResearch#19537.

@wilsen0

…eplies Re-authored against current main from PR NousResearch#10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (NousResearch#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (NousResearch#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes NousResearch#10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>

@kjames2001

…uncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw NousResearch#32535 inter-tool-leak guard. Composes with what just landed: - PR NousResearch#23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR NousResearch#23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR NousResearch#19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes NousResearch#19537.

@wilsen0

…eplies Re-authored against current main from PR NousResearch#10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (NousResearch#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (NousResearch#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes NousResearch#10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>

@kjames2001

…uncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw NousResearch#32535 inter-tool-leak guard. Composes with what just landed: - PR NousResearch#23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR NousResearch#23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR NousResearch#19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes NousResearch#19537.

@wilsen0

…eplies Re-authored against current main from PR NousResearch#10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (NousResearch#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (NousResearch#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes NousResearch#10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>

@kjames2001

…uncation When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit, the adapter caught the BadRequest, best-effort truncated the content with '…', and returned SendResult(success=True). The stream consumer believed the full edit was delivered and never recovered, silently dropping everything past the truncation boundary on long replies. Returning failure isn't safe either — the consumer's existing fallback path can race against the next streaming tick, producing duplicate sends or gaps. Instead, the adapter now SPLITS the oversized payload across the existing message + new continuation messages, so the user always gets the full reply in correct order. How it works: 1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH, call the new _edit_overflow_split helper directly — saves a doomed round-trip + a Telegram error. 2. Reactive: if Telegram still returns 'message_too_long' after the pre-flight (e.g. parse_mode formatting inflated the payload past the limit via MarkdownV2 escapes), the same helper handles it. 3. _edit_overflow_split: - Splits via truncate_message(len_fn=utf16_len) — same chunking the non-streaming send() path uses; chunks get '(1/N)' suffixes. - Edits the original message_id with chunk 1 (with parse_mode + plain-fallback when finalize=True, mirroring the main edit path). - Sends each remaining chunk via self._bot.send_message threaded as a reply to the previous chunk so the user sees them as a contiguous block. MarkdownV2-with-plain-fallback per chunk on finalize. - Returns SendResult(success=True, message_id=<last_chunk_id>, continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the stream consumer can keep editing the most recent visible message and the gateway has full visibility into every message id. SendResult contract extension: Added optional continuation_message_ids: tuple = () field. When empty (the common case), behavior is unchanged. When populated, the caller knows the adapter delivered across multiple platform messages. Stream consumer integration: GatewayStreamConsumer._send_or_edit advances _message_id to the last-continuation id when it sees continuation_message_ids on a successful edit result, resets _last_sent_text (the new visible message holds only the final chunk's text), and fires on_new_message so tool-progress bubbles linearize below the new continuation rather than the original. Mirrors the openclaw NousResearch#32535 inter-tool-leak guard. Composes with what just landed: - PR NousResearch#23455 (UTF-16 length-aware splitting in stream consumer) prevents most overflows upstream by measuring text in UTF-16 codeunits before deciding to split. This PR is the safety net at the adapter boundary. - PR NousResearch#23512 (native draft streaming, default for DM Telegram) routes DM streaming through send_draft, which has its own contract unaffected by this change. So this fix narrows in scope to the edit-based path: groups, supergroups, forum topics, every non-Telegram platform, and the per-response fallback after a draft failure. Salvage notes: - Cherry-picked from PR NousResearch#19537 by @kjames2001. Original PR returned failure on overflow; this evolves to split-and-deliver so users never lose content and the consumer state stays consistent. - Dropped an unrelated model-picker hunk (line 2114-2117) that silently killed the 'X more available — type /model <name> directly' hint by hardcoding total=len(models). Not in scope. - Restored the timeout-aware retryable=not is_timeout signal in send()'s fallthrough catch block. Closes NousResearch#19537.

@wilsen0

…eplies Re-authored against current main from PR NousResearch#10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (NousResearch#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (NousResearch#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes NousResearch#10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>

NivOO5 and others added 3 commits May 10, 2026 19:12

chore: AUTHOR_MAP entry for NivOO5

28da919

teknium1 merged commit 9e005d6 into main May 11, 2026
16 of 19 checks passed

teknium1 deleted the salvage/pr-3412 branch May 11, 2026 03:02

This was referenced May 11, 2026

feat(telegram): add native draft streaming for DMs #3412

Closed

fix(telegram): split-and-deliver oversized edits instead of silent truncation (salvage of #19537) #23576

Merged

teknium1 mentioned this pull request May 11, 2026

perf(gateway): tune Telegram cadence + adaptive fast-path for short replies (salvage of #10388) #23587

Merged

teknium1 mentioned this pull request May 11, 2026

perf(gateway): reduce Telegram end-to-end response latency #10388

Closed

7 tasks

BrewTestBot mentioned this pull request May 16, 2026

hermes-agent 2026.5.16 Homebrew/homebrew-core#283141

Merged

1 task

github-actions Bot mentioned this pull request May 17, 2026

chore: bump NousResearch/hermes-agent version from v2026.5.7 to v2026.5.16 Docker-Hub-sirmark/docker-hermes-agent#6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telegram): native draft streaming via sendMessageDraft (Bot API 9.5+) (salvage of #3412)#23512

feat(telegram): native draft streaming via sendMessageDraft (Bot API 9.5+) (salvage of #3412)#23512
teknium1 merged 3 commits into
mainfrom
salvage/pr-3412

teknium1 commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented May 11, 2026

Summary

How it works

Changes

Salvage notes

Validation

Compatibility

Closes

Uh oh!

github-actions Bot commented May 11, 2026

🔎 Lint report: salvage/pr-3412 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔎 Lint report: `salvage/pr-3412` vs `origin/main`