fix(gateway): Telegram album delivery with per-item captions#10444
Open
MidnightLychee wants to merge 5 commits into
Open
fix(gateway): Telegram album delivery with per-item captions#10444MidnightLychee wants to merge 5 commits into
MidnightLychee wants to merge 5 commits into
Conversation
…elegram Three bugs fixed in the streaming media delivery pipeline: 1. Album not forming on Telegram: send_media_group() leaked file handles via open() on retry, silently falling back to individual sends. Use read_bytes() instead so bytes can be reused across the MarkdownV2/plain-text retry attempts. 2. Caption text leaking into streamed message: when the response contains TextBlocks plus captioned media, edit the already-streamed message down to text-only content instead of leaving caption lines visible in the body. Falls back to delete when there are no TextBlocks. 3. Per-item captions within a single album: parser now treats a caption line immediately after a MEDIA: line as that item's caption, keeping all items in one MediaGroupBlock. A blank line after caption text ends the group. Previously the parser assigned only one caption to the whole group, and starting a new group required caption-then-MEDIA which broke up intended albums. Also adds the ContentBlock data structures (TextBlock, ImageBlock, MediaGroupBlock, MediaGroupItem) and the default send_media_group() fallback in BasePlatformAdapter. Tests: 12 new parser tests covering per-item captions, trailing caption attaches to last item, blank-line group separation, auto-split at 10 items, URL media, FILE: prefix. Plus AlreadySentAgent regression tests in test_run_progress_topics.
Follow-up to the delivery-layer fix in c2b8de4. Appends a shared _MEDIA_GROUP_RULES block to the telegram, whatsapp, discord, slack, signal, and bluebubbles platform hints so non-Anthropic models adopt the new response shape reliably: - Example-first (small models imitate the shape directly) - Explicit "2 or more ... ALWAYS" trigger - Rules call out the split/merge pitfalls (blank line ends album, no prose between MEDIA: lines, trailing single caption shares) - Points at the telegram-media-group-captions skill for complex cases weixin/wecom intentionally skipped — native album semantics there differ from Telegram/Signal and warrant a separate pass. Manually verified on a live Telegram deployment with two models: - gpt-5.4 adopted the album shape without any skill hint - gemma-4-26b (heretic-apex) adopted it once the rules landed; previously needed an explicit skill-name hint in the user turn Tests: 92 new cases in TestMediaGroupRules, full test_prompt_builder suite 150/150 green.
…treaming path The feature added _parse_content_blocks + send_media_group to BasePlatformAdapter, and wired the streaming already_sent path through them, but left _process_message_background using the legacy extract_media + extract_images + per-item send_image_file loop. On platforms where streaming is off (or the response arrived without streaming), a response containing MEDIA:/caption pairs produced one text blob with all captions and N individual sendPhoto calls — never a real Telegram album. Detect MediaGroupBlock in the parsed response and dispatch TextBlock / ImageBlock / MediaGroupBlock directly, short-circuiting the legacy pipeline only for album-shaped responses so TTS/auto-TTS/local-file fallbacks for plain-text replies are untouched.
…s_message_background Covers the non-streaming delivery path that bypassed _parse_content_blocks and produced a caption-blob + N individual sendPhoto calls instead of a single sendMediaGroup. Three cases: - 3-item album dispatches one send_media_group with per-item captions; no send_image_file or send_image calls; no caption text dumped to send() - 12-item response splits into 10 + 2 albums (Telegram's group cap) - Prose before MEDIA: lines is sent as a text message with MEDIA: tokens stripped; album delivered separately Guards against future upstream refactors that reintroduce a bypass of the content-block dispatcher.
…d-media-group skill _MEDIA_GROUP_RULES pointed at 'telegram-media-group-captions', but no such skill file exists in the tree. Claude/GPT models still worked because they learn directly from the inline few-shot example, but smaller / open-weight models (gemma4, etc.) tried to look up the named skill in Hermes' auto-injected skill listing, found nothing, and degraded to needing the skill name spelled in the user turn. Since album delivery now spans five platforms (Telegram, Discord, Signal, Slack, Feishu), rename the referenced skill to the platform-neutral 'send-media-group' and ship the actual SKILL.md file with it. The new skill's description carries keyword triggers (album, media group, gallery, photo collection, multi-image, per-item captions, every platform name) so the skill index surfaces it whenever a model is about to deliver 2+ images. Body covers the exact MEDIA: response shape, all edge cases (auto-split at 10, shared vs per-item captions, blank-line separators), and how each of the 5 native-grouping platforms renders the result. Minimum-footprint fix: - 1 new SKILL.md file - 1 string replaced in agent/prompt_builder.py - 1 test assertion updated No change to adapter code, dispatcher, parser, or other skills.
646bbf5 to
a476708
Compare
Collaborator
MidnightLychee
added a commit
to MidnightLychee/hermes-agent
that referenced
this pull request
May 3, 2026
Port the Telegram media-group/per-item caption fix from PR NousResearch#10444 onto upstream main at 6f2dab2. This restores album-shaped MEDIA responses so per-item captions are preserved and Telegram can send them via send_media_group instead of falling back to individual image sends. Changes: - Add content-block parsing for text, single media, and media groups - Route album-shaped non-streaming responses through send_media_group - Add Telegram media-group delivery with caption formatting fallback - Add prompt guidance and the send-media-group skill reference - Add regression coverage for parser, streaming, and non-streaming album delivery Verification: - py_compile passed for touched Python files - pytest not run locally because this checkout has no venv/uv/pytest environment
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores Telegram album (sendMediaGroup) delivery with per-item captions. PR #9297 landed this feature on v0.8, but the v0.9 gateway refactor split delivery into two paths and only one of them was migrated — the non-streaming path in `BasePlatformAdapter._process_message_background` kept calling the legacy `extract_media` + `extract_images` + per-item `send_image_file` loop, so album-shaped responses produced one text blob of captions plus N standalone sendPhoto calls.
Commits
Behavior after this PR
Fixes #9291.
Test plan