feat(gateway): support [[as_document]] directive for skill media routing#19069
Closed
leon7609 wants to merge 1 commit into
Closed
feat(gateway): support [[as_document]] directive for skill media routing#19069leon7609 wants to merge 1 commit into
leon7609 wants to merge 1 commit into
Conversation
Skills that produce large/lossless images (e.g. info-graph, where a
rendered JPG is 1-2 MB) currently lose quality in Telegram delivery
because `_IMAGE_EXTS` membership routes the file through
`send_multiple_images` → `sendMediaGroup`, which Telegram's server
re-encodes to JPEG @ 1280px max edge. The original bytes only survive
when the file goes through `send_document`, which the dispatch tables
in three places (`_process_message_background`, `_deliver_media_from_response`,
and the `send_message` tool's telegram path) only reach for files
whose extension is NOT in `_IMAGE_EXTS`.
This commit adds an `[[as_document]]` directive that mirrors the
existing `[[audio_as_voice]]` shape: a skill emits the directive once
in its response, and every image-extension MEDIA: file in that response
is delivered via `send_document` instead of `send_multiple_images` /
`sendPhoto`. The directive is detected at the dispatch sites (which see
the raw response) and the directive string is stripped from the
user-visible cleaned text in `extract_media` so it never leaks.
Granularity is intentionally all-or-nothing per response, matching
[[audio_as_voice]]'s scope. Skills that need fine control can split into
two responses.
Verified the targeted use case: info-graph emits
信息图已生成(...)
[[as_document]]
MEDIA:/tmp/info-graph-x/infographic.jpg
→ Telegram receives `infographic.jpg` via sendDocument, original 1MB
JPEG bytes preserved, no recompression. Forwarding and download
filenames stay clean (`infographic.jpg`).
Tests: +3 cases in TestExtractMedia covering directive strip, isolation
from voice flag, and coexistence with [[audio_as_voice]]. All
113 pre-existing media/extract/send tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Skills that produce large/lossless images (e.g. info-graph, where a rendered JPG is 1-2 MB at 3840×2160) currently lose quality when delivered through Telegram, because Hermes'
_IMAGE_EXTSmembership routes the file throughsend_multiple_images→sendMediaGroup, which Telegram's server re-encodes to JPEG @ ~1280px max edge. Small-glyph legibility is destroyed (Chinese body text in a 28-32px font collapses to ~9px after compression).The original bytes only survive when the file goes through
send_document. The dispatch tables in three places (_process_message_background,_deliver_media_from_response, and thesend_messagetool's telegram path) only reachsend_documentfor files whose extension is NOT in_IMAGE_EXTS. Skills can't currently override this from the response side.This is independent of #15728 / #15837 / #18444 (which handle Telegram-server-side rejections via
Photo_invalid_dimensions); those fix the case where Telegram gives up on a photo. This PR fixes the case where the photo path succeeds but compresses too aggressively for the use case.What
Adds an
[[as_document]]directive that mirrors the existing[[audio_as_voice]]shape:When the directive is present in the agent's response, every image-extension MEDIA: file in that response routes through
send_documentinstead ofsend_multiple_images/sendPhoto. Telegram's sendDocument doesn't recompress, so the original 1MB JPEG arrives intact.The directive is detected at the dispatch sites (which see the raw response) and the directive string is stripped from user-visible cleaned text in
extract_media, so it never leaks into the user's chat.Scope
[[audio_as_voice]]'s scope). Skills that need fine control can split into two responses.BasePlatformAdapter._process_message_background,GatewayRunner._deliver_media_from_response, andsend_messagetool's_send_telegram/_send_to_platform. The first two fix the agent's normal response path; the third fixes when an agent callssend_message(target='telegram', message='[[as_document]]\n...')explicitly._image_paths→_non_image_mediaswap.Tests
+3cases inTestExtractMedia:[[audio_as_voice]]All 113 pre-existing media/extract/send tests still pass.
Verification
Tested locally with info-graph (https://github.com/leon7609/info-graph v0.2.2 emits the directive on every successful render). Before patch: 3840×2160 JPG → Telegram delivers compressed photo at ~1280px. After patch: same source → Telegram delivers original 1MB JPEG via
sendDocument, fully legible.