Skip to content

fix(gateway): Telegram album delivery with per-item captions#10444

Open
MidnightLychee wants to merge 5 commits into
NousResearch:mainfrom
MidnightLychee:fix/telegram-album-delivery-v2
Open

fix(gateway): Telegram album delivery with per-item captions#10444
MidnightLychee wants to merge 5 commits into
NousResearch:mainfrom
MidnightLychee:fix/telegram-album-delivery-v2

Conversation

@MidnightLychee

Copy link
Copy Markdown

Summary

Restores Telegram album (sendMediaGroup) delivery with per-item captions. PR #9297 landed this feature on v0.8, but the v0.9 gateway refactor split delivery into two paths and only one of them was migrated — the non-streaming path in `BasePlatformAdapter._process_message_background` kept calling the legacy `extract_media` + `extract_images` + per-item `send_image_file` loop, so album-shaped responses produced one text blob of captions plus N standalone sendPhoto calls.

Commits

  1. `fix(gateway): proper media group delivery with per-item captions on Telegram` — the original PR feat(gateway): telegram media delivery — reliable albums, no caption leakage, per-item captions #9297 content (content-block parser, `MediaGroupBlock`, `send_media_group`, streaming-path wiring, 12 parser tests).
  2. `feat(prompt): teach all chat platforms the per-item album caption shape` — platform hints for the per-item caption format (92 prompt_builder tests).
  3. `fix(gateway): route MediaGroupBlock through send_media_group in non-streaming path` — the v0.9 fix. Detects `MediaGroupBlock` in the parsed response and dispatches `TextBlock` / `ImageBlock` / `MediaGroupBlock` directly, short-circuiting the legacy pipeline only for album-shaped responses so TTS / auto-voice / local-file fallbacks for plain-text replies are untouched.
  4. `test(gateway): e2e regression test for album delivery` — guards the non-streaming path with three cases (3-item album, 12-item auto-split at 10, prose + album). Confirmed to fail without the fix and pass with it.

Behavior after this PR

  • Response `MEDIA:/path\ncaption\nMEDIA:/path\ncaption\n...` with 2+ items dispatches as a single `sendMediaGroup` with per-item captions.
  • Albums >10 items auto-split into groups of 10 + remainder (Telegram's cap).
  • Blank line between MEDIA: lines ends the album and starts a new one.
  • Prose outside MEDIA: blocks is sent as a separate text message with `MEDIA:` tokens stripped.
  • The streaming path (`_deliver_media_from_response`) is unchanged — it was already correct on main.

Fixes #9291.

Test plan

  • `tests/gateway/test_platform_base.py` — 100+ parser unit tests
  • `tests/gateway/test_album_delivery_e2e.py` — 3 new e2e cases
  • `tests/agent/test_prompt_builder.py` — 92 prompt-builder cases including `TestMediaGroupRules`
  • Full Telegram suite (`test_telegram_*`) green
  • Manually verified on a live Telegram deployment (restarted gateway, requested a 16-image album, received two albums of 10 + 6 with per-item captions and no leading caption blob)

MidnightLychee and others added 5 commits April 16, 2026 00:46
…elegram

Three bugs fixed in the streaming media delivery pipeline:

1. Album not forming on Telegram: send_media_group() leaked file
   handles via open() on retry, silently falling back to individual
   sends. Use read_bytes() instead so bytes can be reused across the
   MarkdownV2/plain-text retry attempts.

2. Caption text leaking into streamed message: when the response
   contains TextBlocks plus captioned media, edit the already-streamed
   message down to text-only content instead of leaving caption lines
   visible in the body. Falls back to delete when there are no
   TextBlocks.

3. Per-item captions within a single album: parser now treats a
   caption line immediately after a MEDIA: line as that item's
   caption, keeping all items in one MediaGroupBlock. A blank line
   after caption text ends the group. Previously the parser assigned
   only one caption to the whole group, and starting a new group
   required caption-then-MEDIA which broke up intended albums.

Also adds the ContentBlock data structures (TextBlock, ImageBlock,
MediaGroupBlock, MediaGroupItem) and the default send_media_group()
fallback in BasePlatformAdapter.

Tests: 12 new parser tests covering per-item captions, trailing
caption attaches to last item, blank-line group separation, auto-split
at 10 items, URL media, FILE: prefix. Plus AlreadySentAgent regression
tests in test_run_progress_topics.
Follow-up to the delivery-layer fix in c2b8de4. Appends a shared
_MEDIA_GROUP_RULES block to the telegram, whatsapp, discord, slack,
signal, and bluebubbles platform hints so non-Anthropic models adopt
the new response shape reliably:

- Example-first (small models imitate the shape directly)
- Explicit "2 or more ... ALWAYS" trigger
- Rules call out the split/merge pitfalls (blank line ends album,
  no prose between MEDIA: lines, trailing single caption shares)
- Points at the telegram-media-group-captions skill for complex cases

weixin/wecom intentionally skipped — native album semantics there
differ from Telegram/Signal and warrant a separate pass.

Manually verified on a live Telegram deployment with two models:
  - gpt-5.4 adopted the album shape without any skill hint
  - gemma-4-26b (heretic-apex) adopted it once the rules landed;
    previously needed an explicit skill-name hint in the user turn

Tests: 92 new cases in TestMediaGroupRules, full test_prompt_builder
suite 150/150 green.
…treaming path

The feature added _parse_content_blocks + send_media_group to BasePlatformAdapter,
and wired the streaming already_sent path through them, but left
_process_message_background using the legacy extract_media + extract_images +
per-item send_image_file loop. On platforms where streaming is off (or the
response arrived without streaming), a response containing MEDIA:/caption pairs
produced one text blob with all captions and N individual sendPhoto calls —
never a real Telegram album.

Detect MediaGroupBlock in the parsed response and dispatch TextBlock /
ImageBlock / MediaGroupBlock directly, short-circuiting the legacy pipeline
only for album-shaped responses so TTS/auto-TTS/local-file fallbacks for
plain-text replies are untouched.
…s_message_background

Covers the non-streaming delivery path that bypassed _parse_content_blocks
and produced a caption-blob + N individual sendPhoto calls instead of a
single sendMediaGroup. Three cases:

- 3-item album dispatches one send_media_group with per-item captions;
  no send_image_file or send_image calls; no caption text dumped to send()
- 12-item response splits into 10 + 2 albums (Telegram's group cap)
- Prose before MEDIA: lines is sent as a text message with MEDIA: tokens
  stripped; album delivered separately

Guards against future upstream refactors that reintroduce a bypass of the
content-block dispatcher.
…d-media-group skill

_MEDIA_GROUP_RULES pointed at 'telegram-media-group-captions', but no
such skill file exists in the tree. Claude/GPT models still worked
because they learn directly from the inline few-shot example, but
smaller / open-weight models (gemma4, etc.) tried to look up the named
skill in Hermes' auto-injected skill listing, found nothing, and
degraded to needing the skill name spelled in the user turn.

Since album delivery now spans five platforms (Telegram, Discord,
Signal, Slack, Feishu), rename the referenced skill to the
platform-neutral 'send-media-group' and ship the actual SKILL.md file
with it.

The new skill's description carries keyword triggers (album, media
group, gallery, photo collection, multi-image, per-item captions,
every platform name) so the skill index surfaces it whenever a model
is about to deliver 2+ images. Body covers the exact MEDIA: response
shape, all edge cases (auto-split at 10, shared vs per-item captions,
blank-line separators), and how each of the 5 native-grouping
platforms renders the result.

Minimum-footprint fix:
  - 1 new SKILL.md file
  - 1 string replaced in agent/prompt_builder.py
  - 1 test assertion updated

No change to adapter code, dispatcher, parser, or other skills.
@MidnightLychee MidnightLychee force-pushed the fix/telegram-album-delivery-v2 branch from 646bbf5 to a476708 Compare April 15, 2026 18:40
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter comp/gateway Gateway runner, session dispatch, delivery labels Apr 26, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #9291 (root issue) and #9297 (original v0.8 implementation that was partially lost in v0.9 refactor).

MidnightLychee added a commit to MidnightLychee/hermes-agent that referenced this pull request May 3, 2026
Port the Telegram media-group/per-item caption fix from PR NousResearch#10444 onto upstream main at 6f2dab2.

This restores album-shaped MEDIA responses so per-item captions are preserved and Telegram can send them via send_media_group instead of falling back to individual image sends.

Changes:
- Add content-block parsing for text, single media, and media groups
- Route album-shaped non-streaming responses through send_media_group
- Add Telegram media-group delivery with caption formatting fallback
- Add prompt guidance and the send-media-group skill reference
- Add regression coverage for parser, streaming, and non-streaming album delivery

Verification:
- py_compile passed for touched Python files
- pytest not run locally because this checkout has no venv/uv/pytest environment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram media delivery: albums silently degrade, captions leak into streamed text, and per-item captions aren't supported

2 participants