fix(gateway): Telegram album delivery with per-item captions by MidnightLychee · Pull Request #10444 · NousResearch/hermes-agent

MidnightLychee · 2026-04-15T17:19:01Z

Summary

Restores Telegram album (sendMediaGroup) delivery with per-item captions. PR #9297 landed this feature on v0.8, but the v0.9 gateway refactor split delivery into two paths and only one of them was migrated — the non-streaming path in `BasePlatformAdapter._process_message_background` kept calling the legacy `extract_media` + `extract_images` + per-item `send_image_file` loop, so album-shaped responses produced one text blob of captions plus N standalone sendPhoto calls.

Commits

`fix(gateway): proper media group delivery with per-item captions on Telegram` — the original PR feat(gateway): telegram media delivery — reliable albums, no caption leakage, per-item captions #9297 content (content-block parser, `MediaGroupBlock`, `send_media_group`, streaming-path wiring, 12 parser tests).
`feat(prompt): teach all chat platforms the per-item album caption shape` — platform hints for the per-item caption format (92 prompt_builder tests).
`fix(gateway): route MediaGroupBlock through send_media_group in non-streaming path` — the v0.9 fix. Detects `MediaGroupBlock` in the parsed response and dispatches `TextBlock` / `ImageBlock` / `MediaGroupBlock` directly, short-circuiting the legacy pipeline only for album-shaped responses so TTS / auto-voice / local-file fallbacks for plain-text replies are untouched.
`test(gateway): e2e regression test for album delivery` — guards the non-streaming path with three cases (3-item album, 12-item auto-split at 10, prose + album). Confirmed to fail without the fix and pass with it.

Behavior after this PR

Response `MEDIA:/path\ncaption\nMEDIA:/path\ncaption\n...` with 2+ items dispatches as a single `sendMediaGroup` with per-item captions.
Albums >10 items auto-split into groups of 10 + remainder (Telegram's cap).
Blank line between MEDIA: lines ends the album and starts a new one.
Prose outside MEDIA: blocks is sent as a separate text message with `MEDIA:` tokens stripped.
The streaming path (`_deliver_media_from_response`) is unchanged — it was already correct on main.

Fixes #9291.

Test plan

`tests/gateway/test_platform_base.py` — 100+ parser unit tests
`tests/gateway/test_album_delivery_e2e.py` — 3 new e2e cases
`tests/agent/test_prompt_builder.py` — 92 prompt-builder cases including `TestMediaGroupRules`
Full Telegram suite (`test_telegram_*`) green
Manually verified on a live Telegram deployment (restarted gateway, requested a 16-image album, received two albums of 10 + 6 with per-item captions and no leading caption blob)

…elegram Three bugs fixed in the streaming media delivery pipeline: 1. Album not forming on Telegram: send_media_group() leaked file handles via open() on retry, silently falling back to individual sends. Use read_bytes() instead so bytes can be reused across the MarkdownV2/plain-text retry attempts. 2. Caption text leaking into streamed message: when the response contains TextBlocks plus captioned media, edit the already-streamed message down to text-only content instead of leaving caption lines visible in the body. Falls back to delete when there are no TextBlocks. 3. Per-item captions within a single album: parser now treats a caption line immediately after a MEDIA: line as that item's caption, keeping all items in one MediaGroupBlock. A blank line after caption text ends the group. Previously the parser assigned only one caption to the whole group, and starting a new group required caption-then-MEDIA which broke up intended albums. Also adds the ContentBlock data structures (TextBlock, ImageBlock, MediaGroupBlock, MediaGroupItem) and the default send_media_group() fallback in BasePlatformAdapter. Tests: 12 new parser tests covering per-item captions, trailing caption attaches to last item, blank-line group separation, auto-split at 10 items, URL media, FILE: prefix. Plus AlreadySentAgent regression tests in test_run_progress_topics.

Follow-up to the delivery-layer fix in c2b8de4. Appends a shared _MEDIA_GROUP_RULES block to the telegram, whatsapp, discord, slack, signal, and bluebubbles platform hints so non-Anthropic models adopt the new response shape reliably: - Example-first (small models imitate the shape directly) - Explicit "2 or more ... ALWAYS" trigger - Rules call out the split/merge pitfalls (blank line ends album, no prose between MEDIA: lines, trailing single caption shares) - Points at the telegram-media-group-captions skill for complex cases weixin/wecom intentionally skipped — native album semantics there differ from Telegram/Signal and warrant a separate pass. Manually verified on a live Telegram deployment with two models: - gpt-5.4 adopted the album shape without any skill hint - gemma-4-26b (heretic-apex) adopted it once the rules landed; previously needed an explicit skill-name hint in the user turn Tests: 92 new cases in TestMediaGroupRules, full test_prompt_builder suite 150/150 green.

…treaming path The feature added _parse_content_blocks + send_media_group to BasePlatformAdapter, and wired the streaming already_sent path through them, but left _process_message_background using the legacy extract_media + extract_images + per-item send_image_file loop. On platforms where streaming is off (or the response arrived without streaming), a response containing MEDIA:/caption pairs produced one text blob with all captions and N individual sendPhoto calls — never a real Telegram album. Detect MediaGroupBlock in the parsed response and dispatch TextBlock / ImageBlock / MediaGroupBlock directly, short-circuiting the legacy pipeline only for album-shaped responses so TTS/auto-TTS/local-file fallbacks for plain-text replies are untouched.

…s_message_background Covers the non-streaming delivery path that bypassed _parse_content_blocks and produced a caption-blob + N individual sendPhoto calls instead of a single sendMediaGroup. Three cases: - 3-item album dispatches one send_media_group with per-item captions; no send_image_file or send_image calls; no caption text dumped to send() - 12-item response splits into 10 + 2 albums (Telegram's group cap) - Prose before MEDIA: lines is sent as a text message with MEDIA: tokens stripped; album delivered separately Guards against future upstream refactors that reintroduce a bypass of the content-block dispatcher.

…d-media-group skill _MEDIA_GROUP_RULES pointed at 'telegram-media-group-captions', but no such skill file exists in the tree. Claude/GPT models still worked because they learn directly from the inline few-shot example, but smaller / open-weight models (gemma4, etc.) tried to look up the named skill in Hermes' auto-injected skill listing, found nothing, and degraded to needing the skill name spelled in the user turn. Since album delivery now spans five platforms (Telegram, Discord, Signal, Slack, Feishu), rename the referenced skill to the platform-neutral 'send-media-group' and ship the actual SKILL.md file with it. The new skill's description carries keyword triggers (album, media group, gallery, photo collection, multi-image, per-item captions, every platform name) so the skill index surfaces it whenever a model is about to deliver 2+ images. Body covers the exact MEDIA: response shape, all edge cases (auto-split at 10, shared vs per-item captions, blank-line separators), and how each of the 5 native-grouping platforms renders the result. Minimum-footprint fix: - 1 new SKILL.md file - 1 string replaced in agent/prompt_builder.py - 1 test assertion updated No change to adapter code, dispatcher, parser, or other skills.

alt-glitch · 2026-04-26T01:45:18Z

Related to #9291 (root issue) and #9297 (original v0.8 implementation that was partially lost in v0.9 refactor).

Port the Telegram media-group/per-item caption fix from PR NousResearch#10444 onto upstream main at 6f2dab2. This restores album-shaped MEDIA responses so per-item captions are preserved and Telegram can send them via send_media_group instead of falling back to individual image sends. Changes: - Add content-block parsing for text, single media, and media groups - Route album-shaped non-streaming responses through send_media_group - Add Telegram media-group delivery with caption formatting fallback - Add prompt guidance and the send-media-group skill reference - Add regression coverage for parser, streaming, and non-streaming album delivery Verification: - py_compile passed for touched Python files - pytest not run locally because this checkout has no venv/uv/pytest environment

MidnightLychee and others added 5 commits April 16, 2026 00:46

MidnightLychee force-pushed the fix/telegram-album-delivery-v2 branch from 646bbf5 to a476708 Compare April 15, 2026 18:40

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter comp/gateway Gateway runner, session dispatch, delivery labels Apr 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): Telegram album delivery with per-item captions#10444

fix(gateway): Telegram album delivery with per-item captions#10444
MidnightLychee wants to merge 5 commits into
NousResearch:mainfrom
MidnightLychee:fix/telegram-album-delivery-v2

MidnightLychee commented Apr 15, 2026

Uh oh!

alt-glitch commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MidnightLychee commented Apr 15, 2026

Summary

Commits

Behavior after this PR

Test plan

Uh oh!

alt-glitch commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants