perf(gateway): tune Telegram cadence + adaptive fast-path for short replies (salvage of #10388)#23587
Merged
Conversation
…eplies Re-authored against current main from PR #10388 by @wilsen0. The original branch is 3800+ commits stale and could not be cherry-picked without reverting unrelated work; this change carries only the perf intent forward. Tuning summary ============== Text-batch ingress (gateway/platforms/telegram.py): - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3 - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0 - Adaptive fast-path tiers in _flush_text_batch: total <= 320 cp -> min(cap, 0.18) total <= 1024 cp -> min(cap, 0.24) else -> cap A single short reply now reaches the agent in ~180ms instead of 600ms. Tier constants compose with the configured cap via min() so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS below 0.18 still wins on every tier. - _env_float_clamped helper replaces bare float(os.getenv()). Rejects NaN / Inf, applies optional min/max bounds. Used for text-batch + media-batch knobs. Prevents asyncio.sleep(NaN) crashes when an operator typos an env var. Stream cadence (gateway/config.py + stream_consumer.py): - StreamingConfig.edit_interval default 1.0s -> 0.8s - StreamingConfig.buffer_threshold default 40 -> 24 chars - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now a single source of truth. StreamConsumerConfig imports them instead of duplicating the literals; the prior dual-source drift is fixed. Tool progress (gateway/display_config.py): - Telegram default tool_progress 'all' -> 'new'. Inside Telegram's ~1 edit/s flood envelope the 'all' default would accumulate edit pressure on busy chats; 'new' shows only the leading bubble per tool batch and feels less spammy. - Slack tier_low override (tool_progress='off') is preserved. Composition with native draft streaming (#23512) ================================================ The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH the draft path (send_draft) and the edit path (edit_message), so the tighter cadence helps native draft as much as edit-based. The text-batch fast-path applies before the consumer starts, so it speeds up the first-token latency on every transport. No conflict. Stale-base avoidance ==================== Re-authored from scratch rather than cherry-picked. Dropped from the original branch: - Unrelated d2f043f 'fix(anthropic): preserve third-party thinking continuity' commit - boot_md.py builtin gateway hook (unrelated) - Reverted Slack tool_progress='off' (#14663) restoration - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO members deletion - 2300+ lines of run.py base-skew noise Tests ===== New tests/gateway/test_telegram_text_batch_perf.py: - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds). - 4 tests for the adaptive-tier composition rules. Updated tests/gateway/test_display_config.py: - test_platform_default_when_no_user_config: 'all' -> 'new' for Telegram, with comment. - test_high_tier_platforms: split into Telegram-overrides-to-new and Discord-stays-all assertions. Closes #10388. Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>
- New tests/gateway/test_telegram_text_batch_perf.py: TestEnvFloatClamped — 7 tests covering default-when-unset, valid parse, garbage fallback, NaN rejection, Inf rejection, min-clamp, max-clamp. Asserts asyncio.sleep() always gets a finite number. TestAdaptiveTextBatchTiers — 4 tests covering the tier-constant invariants and the min(cap, tier_delay) composition rule. - tests/gateway/test_display_config.py: update assertions for Telegram's new tool_progress='new' default.
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-import |
1 |
First entries
tests/gateway/test_telegram_text_batch_perf.py:19: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
✅ Fixed issues: none
Unchanged: 4280 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
7 tasks
Contributor
|
Got it! |
1 task
1 task
13 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Telegram replies feel sluggish on short messages — the user types something brief, the bot responds, and there's a noticeable ~0.6-2s wait before the first token shows. This PR tunes the cadence end-to-end:
tool_progressdefault flips fromall→newso tool-batch bubbles don't compound edit pressure on busy chats.Composes cleanly with the native draft transport (#23512): the cadence gates both
send_draftandedit_messagepaths, and the text-batch fast-path runs before the consumer starts, so first-token latency drops on every transport.How the fast-path works
The tiers compose with the configured cap via
min(), so an operator who tightensHERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDSbelow 0.18 still gets the lower number on every tier.Changes
gateway/platforms/telegram.py_env_float_clampedhelper rejects NaN/Inf and applies min/max bounds.gateway/config.pyDEFAULT_STREAMING_EDIT_INTERVAL=0.8,DEFAULT_STREAMING_BUFFER_THRESHOLD=24,DEFAULT_STREAMING_CURSOR=" ▉"as a single source of truth.StreamingConfigdefaults reference them.gateway/stream_consumer.pyStreamConsumerConfigimports the same constants. Fixes the prior dual-default drift between gateway/config.py and stream_consumer.py.gateway/display_config.py_PLATFORM_DEFAULTSkeeps_TIER_HIGHmembership but overridestool_progressto"new". Slack'stool_progress="off"(#14663) preserved.tests/gateway/test_telegram_text_batch_perf.py_env_float_clamped+ 4 tests for adaptive-tier composition rules.tests/gateway/test_display_config.pytool_progress="new"default.scripts/release.pySalvage notes
PR #10388 by @wilsen0 was branched 3800+ commits behind current main and could not be cherry-picked. Re-authored against current main with @wilsen0's authorship preserved on the substantive commit. Dropped from the original branch:
d2f043f9c"fix(anthropic): preserve third-party thinking continuity" commitboot_md.pybuiltin gateway hook (unrelated)tool_progress="off"override ([Bug]: slack gateway is reporting intermediate tool calls #14663) restorationMSGRAPH_WEBHOOK+YUANBAOmember deletionsrun.pybase-skew noiseValidation
tests/gateway/test_telegram_text_batch_perf.py+test_display_config.py+test_stream_consumer*.py+test_telegram_format.py: 248/248 pass.Closes #10388.
Authored by @wilsen0; design re-authored against current main.