Skip to content

feat(gateway): telegram media delivery — reliable albums, no caption leakage, per-item captions#9297

Open
MidnightLychee wants to merge 2 commits into
NousResearch:mainfrom
MidnightLychee:fix/telegram-media-group-captions
Open

feat(gateway): telegram media delivery — reliable albums, no caption leakage, per-item captions#9297
MidnightLychee wants to merge 2 commits into
NousResearch:mainfrom
MidnightLychee:fix/telegram-media-group-captions

Conversation

@MidnightLychee

@MidnightLychee MidnightLychee commented Apr 14, 2026

Copy link
Copy Markdown

Closes #9291

Summary

Fixes all three issues reported in #9291. Using the repro shape from
that issue:

Here is a summary of the catalog.

MEDIA:/path/item_a_view1.png
Item A — view 1
MEDIA:/path/item_a_view2.png
Item A — view 2
MEDIA:/path/item_b_view1.png
Item B — view 1
MEDIA:/path/item_b_view2.png
Item B — view 2

MEDIA:/path/featured.png
Featured item

After this PR the Telegram gateway produces:

  • One text-only message: Here is a summary of the catalog.
  • One album of four photos, each with its own caption.
  • One single photo with its own caption.

Changes

1. Reliable Telegram albums (issue #9291 §1)

TelegramAdapter.send_media_group() previously opened local files via
open(fp, "rb"). When the MarkdownV2 attempt failed, the retry path
could not cleanly reuse the handles, and the call fell through to the
base-class individual-send fallback — while still returning
success=True.

The primary and retry paths now both use file_path.read_bytes(), so
the bytes are independently reusable across attempts. The
individual-send fallback now emits an explicit warning when reached,
so silent degradation is observable.

2. No caption leakage in streamed message (issue #9291 §2)

In streaming mode the full response (caption lines included) is
streamed as a single text message before media delivery runs. The
previous behaviour left those caption lines visible in the prose.

_deliver_media_from_response() now detects the TextBlock +
captioned-media case and edits the streamed message down to the
TextBlock content only, stripping caption lines. Pure-media
responses keep the original delete-and-replace behaviour.

3. Per-item captions within a single album (issue #9291 §3)

_parse_content_blocks() is reworked so that:

  • A caption line immediately after a MEDIA: line attaches to that
    item's caption, keeping items in the same MediaGroupBlock.
  • A blank line after caption text ends the current group.
  • Trailing caption after the last MEDIA: attaches to the LAST item
    when the group uses per-item captions, or to the FIRST item when no
    per-item captions are present — preserving the legacy group-level
    caption shape exactly.

This lets an agent emit an arbitrary mix of per-item captions, a
shared trailing caption, or no captions at all, without accidentally
splitting one intended album into multiple groups, and without
changing the observed behaviour of any pre-existing response shape.

Structural Additions

  • ContentBlock dataclasses (TextBlock, ImageBlock,
    MediaGroupBlock, MediaGroupItem) in gateway/platforms/base.py.
  • Default send_media_group() fallback on BasePlatformAdapter so
    adapters without native album support degrade gracefully to
    sequential single sends.
  • StreamConsumer.response_message_id property exposes the
    already-streamed message ID so the deliverer can edit it in place.

Prompt-Side Adoption (added in this PR)

The delivery layer accepts the per-item caption shape, but getting
non-Anthropic models to produce it reliably needs prompt guidance.
agent/prompt_builder.py now appends a shared _MEDIA_GROUP_RULES
block to the telegram, whatsapp, discord, slack, signal,
and bluebubbles platform hints. The block is example-first, lists
the split/merge pitfalls explicitly, and points at the
telegram-media-group-captions skill for complex cases.

weixin/wecom are intentionally skipped — native album semantics
there differ from Telegram/Signal and warrant a separate pass.

Verified on live Telegram

Model Before rules After rules
gpt-5.4 already followed the shape unchanged
gemma-4-26b (heretic-apex) split each image into its own message unless the user named the skill single album, per-item captions, no skill hint needed

Tests

tests/agent/test_prompt_builder.py::TestMediaGroupRules adds 92
cases covering: rule presence on each included platform, absence on
weixin/email, the exact example block, and the bullet rules.
Full test_prompt_builder suite: 150/150 green.

Behavioural Side-Effect Worth Calling Out

Refactoring the caption path through a shared _prepare_caption()
helper means all Telegram send_* methods (send_photo,
send_video, send_document, send_voice, send_audio,
send_animation, etc.) now attempt MarkdownV2 formatting with a
plain-text fallback, where previously they sent captions as plain text
only. This is a deliberate widening — it makes caption behaviour
consistent across all Telegram sends — but it is broader than "just
albums", so flagging it here for reviewer awareness.

Tests

  • Parser: 13 tests in tests/gateway/test_platform_base.py cover
    per-item captions, legacy group-level trailing caption attaching to
    the first item, mixed per-item + trailing caption attaching to the
    last item, blank-line group separation, auto-split at Telegram's
    10-item album limit, URL-based media, and the FILE: document
    prefix.
  • Streaming regression: new AlreadySentAgent stub and tests in
    tests/gateway/test_run_progress_topics.py exercise the streaming
    delivery path end-to-end.
  • Affected suite: test_platform_base.py +
    test_run_progress_topics.py → 105 passed, 0 failed.
  • Full suite: 11,234 passed, 0 new failures attributable to this
    PR (tests/ baseline on main has 79 pre-existing failures,
    unchanged by this PR — verified by running the full suite on HEAD~1
    and HEAD and diffing the failure sets).

Manual Testing

Verified on a live Telegram deployment with a multi-item product
catalogue response (multiple items × multiple views). Confirmed:

  • Album forms correctly with each item's caption attached.
  • Streamed prose message ends up with intro text only.
  • Single trailing image with caption still works.
  • FILE:MEDIA: prefix still delivers as a document.
  • MarkdownV2 in captions (**bold**, ||spoiler||) renders correctly
    in single images and in album items.

Files Changed

File Change
gateway/platforms/base.py Parser rewrite, ContentBlock types, default send_media_group() fallback
gateway/platforms/telegram.py send_media_group() byte-buffer fix, _prepare_caption() helper, delete_message()
gateway/run.py _deliver_media_from_response() edits streamed message when prose + captioned media coexist
gateway/stream_consumer.py response_message_id property
tests/gateway/test_platform_base.py Parser tests
tests/gateway/test_run_progress_topics.py Streaming regression tests

…elegram

Three bugs fixed in the streaming media delivery pipeline:

1. Album not forming on Telegram: send_media_group() leaked file
   handles via open() on retry, silently falling back to individual
   sends. Use read_bytes() instead so bytes can be reused across the
   MarkdownV2/plain-text retry attempts.

2. Caption text leaking into streamed message: when the response
   contains TextBlocks plus captioned media, edit the already-streamed
   message down to text-only content instead of leaving caption lines
   visible in the body. Falls back to delete when there are no
   TextBlocks.

3. Per-item captions within a single album: parser now treats a
   caption line immediately after a MEDIA: line as that item's
   caption, keeping all items in one MediaGroupBlock. A blank line
   after caption text ends the group. Previously the parser assigned
   only one caption to the whole group, and starting a new group
   required caption-then-MEDIA which broke up intended albums.

Also adds the ContentBlock data structures (TextBlock, ImageBlock,
MediaGroupBlock, MediaGroupItem) and the default send_media_group()
fallback in BasePlatformAdapter.

Tests: 12 new parser tests covering per-item captions, trailing
caption attaches to last item, blank-line group separation, auto-split
at 10 items, URL media, FILE: prefix. Plus AlreadySentAgent regression
tests in test_run_progress_topics.
Follow-up to the delivery-layer fix in c2b8de4. Appends a shared
_MEDIA_GROUP_RULES block to the telegram, whatsapp, discord, slack,
signal, and bluebubbles platform hints so non-Anthropic models adopt
the new response shape reliably:

- Example-first (small models imitate the shape directly)
- Explicit "2 or more ... ALWAYS" trigger
- Rules call out the split/merge pitfalls (blank line ends album,
  no prose between MEDIA: lines, trailing single caption shares)
- Points at the telegram-media-group-captions skill for complex cases

weixin/wecom intentionally skipped — native album semantics there
differ from Telegram/Signal and warrant a separate pass.

Manually verified on a live Telegram deployment with two models:
  - gpt-5.4 adopted the album shape without any skill hint
  - gemma-4-26b (heretic-apex) adopted it once the rules landed;
    previously needed an explicit skill-name hint in the user turn

Tests: 92 new cases in TestMediaGroupRules, full test_prompt_builder
suite 150/150 green.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram media delivery: albums silently degrade, captions leak into streamed text, and per-item captions aren't supported

2 participants