feat(gateway): telegram media delivery — reliable albums, no caption leakage, per-item captions#9297
Open
MidnightLychee wants to merge 2 commits into
Open
Conversation
…elegram Three bugs fixed in the streaming media delivery pipeline: 1. Album not forming on Telegram: send_media_group() leaked file handles via open() on retry, silently falling back to individual sends. Use read_bytes() instead so bytes can be reused across the MarkdownV2/plain-text retry attempts. 2. Caption text leaking into streamed message: when the response contains TextBlocks plus captioned media, edit the already-streamed message down to text-only content instead of leaving caption lines visible in the body. Falls back to delete when there are no TextBlocks. 3. Per-item captions within a single album: parser now treats a caption line immediately after a MEDIA: line as that item's caption, keeping all items in one MediaGroupBlock. A blank line after caption text ends the group. Previously the parser assigned only one caption to the whole group, and starting a new group required caption-then-MEDIA which broke up intended albums. Also adds the ContentBlock data structures (TextBlock, ImageBlock, MediaGroupBlock, MediaGroupItem) and the default send_media_group() fallback in BasePlatformAdapter. Tests: 12 new parser tests covering per-item captions, trailing caption attaches to last item, blank-line group separation, auto-split at 10 items, URL media, FILE: prefix. Plus AlreadySentAgent regression tests in test_run_progress_topics.
Follow-up to the delivery-layer fix in c2b8de4. Appends a shared _MEDIA_GROUP_RULES block to the telegram, whatsapp, discord, slack, signal, and bluebubbles platform hints so non-Anthropic models adopt the new response shape reliably: - Example-first (small models imitate the shape directly) - Explicit "2 or more ... ALWAYS" trigger - Rules call out the split/merge pitfalls (blank line ends album, no prose between MEDIA: lines, trailing single caption shares) - Points at the telegram-media-group-captions skill for complex cases weixin/wecom intentionally skipped — native album semantics there differ from Telegram/Signal and warrant a separate pass. Manually verified on a live Telegram deployment with two models: - gpt-5.4 adopted the album shape without any skill hint - gemma-4-26b (heretic-apex) adopted it once the rules landed; previously needed an explicit skill-name hint in the user turn Tests: 92 new cases in TestMediaGroupRules, full test_prompt_builder suite 150/150 green.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #9291
Summary
Fixes all three issues reported in #9291. Using the repro shape from
that issue:
After this PR the Telegram gateway produces:
Here is a summary of the catalog.Changes
1. Reliable Telegram albums (issue #9291 §1)
TelegramAdapter.send_media_group()previously opened local files viaopen(fp, "rb"). When the MarkdownV2 attempt failed, the retry pathcould not cleanly reuse the handles, and the call fell through to the
base-class individual-send fallback — while still returning
success=True.The primary and retry paths now both use
file_path.read_bytes(), sothe bytes are independently reusable across attempts. The
individual-send fallback now emits an explicit warning when reached,
so silent degradation is observable.
2. No caption leakage in streamed message (issue #9291 §2)
In streaming mode the full response (caption lines included) is
streamed as a single text message before media delivery runs. The
previous behaviour left those caption lines visible in the prose.
_deliver_media_from_response()now detects theTextBlock+captioned-media case and edits the streamed message down to the
TextBlockcontent only, stripping caption lines. Pure-mediaresponses keep the original delete-and-replace behaviour.
3. Per-item captions within a single album (issue #9291 §3)
_parse_content_blocks()is reworked so that:MEDIA:line attaches to thatitem's caption, keeping items in the same
MediaGroupBlock.MEDIA:attaches to the LAST itemwhen the group uses per-item captions, or to the FIRST item when no
per-item captions are present — preserving the legacy group-level
caption shape exactly.
This lets an agent emit an arbitrary mix of per-item captions, a
shared trailing caption, or no captions at all, without accidentally
splitting one intended album into multiple groups, and without
changing the observed behaviour of any pre-existing response shape.
Structural Additions
ContentBlockdataclasses (TextBlock,ImageBlock,MediaGroupBlock,MediaGroupItem) ingateway/platforms/base.py.send_media_group()fallback onBasePlatformAdaptersoadapters without native album support degrade gracefully to
sequential single sends.
StreamConsumer.response_message_idproperty exposes thealready-streamed message ID so the deliverer can edit it in place.
Prompt-Side Adoption (added in this PR)
The delivery layer accepts the per-item caption shape, but getting
non-Anthropic models to produce it reliably needs prompt guidance.
agent/prompt_builder.pynow appends a shared_MEDIA_GROUP_RULESblock to the
telegram,whatsapp,discord,slack,signal,and
bluebubblesplatform hints. The block is example-first, liststhe split/merge pitfalls explicitly, and points at the
telegram-media-group-captionsskill for complex cases.weixin/wecomare intentionally skipped — native album semanticsthere differ from Telegram/Signal and warrant a separate pass.
Verified on live Telegram
Tests
tests/agent/test_prompt_builder.py::TestMediaGroupRulesadds 92cases covering: rule presence on each included platform, absence on
weixin/email, the exact example block, and the bullet rules.Full
test_prompt_buildersuite: 150/150 green.Behavioural Side-Effect Worth Calling Out
Refactoring the caption path through a shared
_prepare_caption()helper means all Telegram
send_*methods (send_photo,send_video,send_document,send_voice,send_audio,send_animation, etc.) now attempt MarkdownV2 formatting with aplain-text fallback, where previously they sent captions as plain text
only. This is a deliberate widening — it makes caption behaviour
consistent across all Telegram sends — but it is broader than "just
albums", so flagging it here for reviewer awareness.
Tests
tests/gateway/test_platform_base.pycoverper-item captions, legacy group-level trailing caption attaching to
the first item, mixed per-item + trailing caption attaching to the
last item, blank-line group separation, auto-split at Telegram's
10-item album limit, URL-based media, and the
FILE:documentprefix.
AlreadySentAgentstub and tests intests/gateway/test_run_progress_topics.pyexercise the streamingdelivery path end-to-end.
test_platform_base.py+test_run_progress_topics.py→ 105 passed, 0 failed.PR (
tests/baseline onmainhas 79 pre-existing failures,unchanged by this PR — verified by running the full suite on
HEAD~1and
HEADand diffing the failure sets).Manual Testing
Verified on a live Telegram deployment with a multi-item product
catalogue response (multiple items × multiple views). Confirmed:
FILE:MEDIA:prefix still delivers as a document.**bold**,||spoiler||) renders correctlyin single images and in album items.
Files Changed
gateway/platforms/base.pyContentBlocktypes, defaultsend_media_group()fallbackgateway/platforms/telegram.pysend_media_group()byte-buffer fix,_prepare_caption()helper,delete_message()gateway/run.py_deliver_media_from_response()edits streamed message when prose + captioned media coexistgateway/stream_consumer.pyresponse_message_idpropertytests/gateway/test_platform_base.pytests/gateway/test_run_progress_topics.py