feat(gateway): telegram media delivery — reliable albums, no caption leakage, per-item captions by MidnightLychee · Pull Request #9297 · NousResearch/hermes-agent

MidnightLychee · 2026-04-14T01:37:40Z

Summary

Fixes all three issues reported in #9291. Using the repro shape from
that issue:

Here is a summary of the catalog.

MEDIA:/path/item_a_view1.png
Item A — view 1
MEDIA:/path/item_a_view2.png
Item A — view 2
MEDIA:/path/item_b_view1.png
Item B — view 1
MEDIA:/path/item_b_view2.png
Item B — view 2

MEDIA:/path/featured.png
Featured item

After this PR the Telegram gateway produces:

One text-only message: Here is a summary of the catalog.
One album of four photos, each with its own caption.
One single photo with its own caption.

Changes

1. Reliable Telegram albums (issue #9291 §1)

TelegramAdapter.send_media_group() previously opened local files via
open(fp, "rb"). When the MarkdownV2 attempt failed, the retry path
could not cleanly reuse the handles, and the call fell through to the
base-class individual-send fallback — while still returning
success=True.

The primary and retry paths now both use file_path.read_bytes(), so
the bytes are independently reusable across attempts. The
individual-send fallback now emits an explicit warning when reached,
so silent degradation is observable.

2. No caption leakage in streamed message (issue #9291 §2)

In streaming mode the full response (caption lines included) is
streamed as a single text message before media delivery runs. The
previous behaviour left those caption lines visible in the prose.

_deliver_media_from_response() now detects the TextBlock +
captioned-media case and edits the streamed message down to the
TextBlock content only, stripping caption lines. Pure-media
responses keep the original delete-and-replace behaviour.

3. Per-item captions within a single album (issue #9291 §3)

_parse_content_blocks() is reworked so that:

A caption line immediately after a MEDIA: line attaches to that
item's caption, keeping items in the same MediaGroupBlock.
A blank line after caption text ends the current group.
Trailing caption after the last MEDIA: attaches to the LAST item
when the group uses per-item captions, or to the FIRST item when no
per-item captions are present — preserving the legacy group-level
caption shape exactly.

This lets an agent emit an arbitrary mix of per-item captions, a
shared trailing caption, or no captions at all, without accidentally
splitting one intended album into multiple groups, and without
changing the observed behaviour of any pre-existing response shape.

Structural Additions

ContentBlock dataclasses (TextBlock, ImageBlock,
MediaGroupBlock, MediaGroupItem) in gateway/platforms/base.py.
Default send_media_group() fallback on BasePlatformAdapter so
adapters without native album support degrade gracefully to
sequential single sends.
StreamConsumer.response_message_id property exposes the
already-streamed message ID so the deliverer can edit it in place.

Prompt-Side Adoption (added in this PR)

The delivery layer accepts the per-item caption shape, but getting
non-Anthropic models to produce it reliably needs prompt guidance.
agent/prompt_builder.py now appends a shared _MEDIA_GROUP_RULES
block to the telegram, whatsapp, discord, slack, signal,
and bluebubbles platform hints. The block is example-first, lists
the split/merge pitfalls explicitly, and points at the
telegram-media-group-captions skill for complex cases.

weixin/wecom are intentionally skipped — native album semantics
there differ from Telegram/Signal and warrant a separate pass.

Verified on live Telegram

Model	Before rules	After rules
gpt-5.4	already followed the shape	unchanged
gemma-4-26b (heretic-apex)	split each image into its own message unless the user named the skill	single album, per-item captions, no skill hint needed

Tests

tests/agent/test_prompt_builder.py::TestMediaGroupRules adds 92
cases covering: rule presence on each included platform, absence on
weixin/email, the exact example block, and the bullet rules.
Full test_prompt_builder suite: 150/150 green.

Behavioural Side-Effect Worth Calling Out

Refactoring the caption path through a shared _prepare_caption()
helper means all Telegram send_* methods (send_photo,
send_video, send_document, send_voice, send_audio,
send_animation, etc.) now attempt MarkdownV2 formatting with a
plain-text fallback, where previously they sent captions as plain text
only. This is a deliberate widening — it makes caption behaviour
consistent across all Telegram sends — but it is broader than "just
albums", so flagging it here for reviewer awareness.

Tests

Parser: 13 tests in tests/gateway/test_platform_base.py cover
per-item captions, legacy group-level trailing caption attaching to
the first item, mixed per-item + trailing caption attaching to the
last item, blank-line group separation, auto-split at Telegram's
10-item album limit, URL-based media, and the FILE: document
prefix.
Streaming regression: new AlreadySentAgent stub and tests in
tests/gateway/test_run_progress_topics.py exercise the streaming
delivery path end-to-end.
Affected suite: test_platform_base.py +
test_run_progress_topics.py → 105 passed, 0 failed.
Full suite: 11,234 passed, 0 new failures attributable to this
PR (tests/ baseline on main has 79 pre-existing failures,
unchanged by this PR — verified by running the full suite on HEAD~1
and HEAD and diffing the failure sets).

Manual Testing

Verified on a live Telegram deployment with a multi-item product
catalogue response (multiple items × multiple views). Confirmed:

Album forms correctly with each item's caption attached.
Streamed prose message ends up with intro text only.
Single trailing image with caption still works.
FILE:MEDIA: prefix still delivers as a document.
MarkdownV2 in captions (**bold**, ||spoiler||) renders correctly
in single images and in album items.

Files Changed

File	Change
`gateway/platforms/base.py`	Parser rewrite, `ContentBlock` types, default `send_media_group()` fallback
`gateway/platforms/telegram.py`	`send_media_group()` byte-buffer fix, `_prepare_caption()` helper, `delete_message()`
`gateway/run.py`	`_deliver_media_from_response()` edits streamed message when prose + captioned media coexist
`gateway/stream_consumer.py`	`response_message_id` property
`tests/gateway/test_platform_base.py`	Parser tests
`tests/gateway/test_run_progress_topics.py`	Streaming regression tests

…elegram Three bugs fixed in the streaming media delivery pipeline: 1. Album not forming on Telegram: send_media_group() leaked file handles via open() on retry, silently falling back to individual sends. Use read_bytes() instead so bytes can be reused across the MarkdownV2/plain-text retry attempts. 2. Caption text leaking into streamed message: when the response contains TextBlocks plus captioned media, edit the already-streamed message down to text-only content instead of leaving caption lines visible in the body. Falls back to delete when there are no TextBlocks. 3. Per-item captions within a single album: parser now treats a caption line immediately after a MEDIA: line as that item's caption, keeping all items in one MediaGroupBlock. A blank line after caption text ends the group. Previously the parser assigned only one caption to the whole group, and starting a new group required caption-then-MEDIA which broke up intended albums. Also adds the ContentBlock data structures (TextBlock, ImageBlock, MediaGroupBlock, MediaGroupItem) and the default send_media_group() fallback in BasePlatformAdapter. Tests: 12 new parser tests covering per-item captions, trailing caption attaches to last item, blank-line group separation, auto-split at 10 items, URL media, FILE: prefix. Plus AlreadySentAgent regression tests in test_run_progress_topics.

Follow-up to the delivery-layer fix in c2b8de4. Appends a shared _MEDIA_GROUP_RULES block to the telegram, whatsapp, discord, slack, signal, and bluebubbles platform hints so non-Anthropic models adopt the new response shape reliably: - Example-first (small models imitate the shape directly) - Explicit "2 or more ... ALWAYS" trigger - Rules call out the split/merge pitfalls (blank line ends album, no prose between MEDIA: lines, trailing single caption shares) - Points at the telegram-media-group-captions skill for complex cases weixin/wecom intentionally skipped — native album semantics there differ from Telegram/Signal and warrant a separate pass. Manually verified on a live Telegram deployment with two models: - gpt-5.4 adopted the album shape without any skill hint - gemma-4-26b (heretic-apex) adopted it once the rules landed; previously needed an explicit skill-name hint in the user turn Tests: 92 new cases in TestMediaGroupRules, full test_prompt_builder suite 150/150 green.

MidnightLychee added 2 commits April 14, 2026 09:36

MidnightLychee mentioned this pull request Apr 15, 2026

fix(gateway): Telegram album delivery with per-item captions #10444

Open

5 tasks

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): telegram media delivery — reliable albums, no caption leakage, per-item captions#9297

feat(gateway): telegram media delivery — reliable albums, no caption leakage, per-item captions#9297
MidnightLychee wants to merge 2 commits into
NousResearch:mainfrom
MidnightLychee:fix/telegram-media-group-captions

MidnightLychee commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MidnightLychee commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Reliable Telegram albums (issue #9291 §1)

2. No caption leakage in streamed message (issue #9291 §2)

3. Per-item captions within a single album (issue #9291 §3)

Structural Additions

Prompt-Side Adoption (added in this PR)

Verified on live Telegram

Tests

Behavioural Side-Effect Worth Calling Out

Tests

Manual Testing

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MidnightLychee commented Apr 14, 2026 •

edited

Loading