feat(gateway): support [[as_document]] directive for skill media routing by leon7609 · Pull Request #19069 · NousResearch/hermes-agent

leon7609 · 2026-05-03T04:12:43Z

Why

Skills that produce large/lossless images (e.g. info-graph, where a rendered JPG is 1-2 MB at 3840×2160) currently lose quality when delivered through Telegram, because Hermes' _IMAGE_EXTS membership routes the file through send_multiple_images → sendMediaGroup, which Telegram's server re-encodes to JPEG @ ~1280px max edge. Small-glyph legibility is destroyed (Chinese body text in a 28-32px font collapses to ~9px after compression).

The original bytes only survive when the file goes through send_document. The dispatch tables in three places (_process_message_background, _deliver_media_from_response, and the send_message tool's telegram path) only reach send_document for files whose extension is NOT in _IMAGE_EXTS. Skills can't currently override this from the response side.

This is independent of #15728 / #15837 / #18444 (which handle Telegram-server-side rejections via Photo_invalid_dimensions); those fix the case where Telegram gives up on a photo. This PR fixes the case where the photo path succeeds but compresses too aggressively for the use case.

What

Adds an [[as_document]] directive that mirrors the existing [[audio_as_voice]] shape:

信息图已生成（高密度模块 × 实验室波普 × 横版）
[[as_document]]
MEDIA:/private/tmp/info-graph-x/infographic.jpg

When the directive is present in the agent's response, every image-extension MEDIA: file in that response routes through send_document instead of send_multiple_images / sendPhoto. Telegram's sendDocument doesn't recompress, so the original 1MB JPEG arrives intact.

The directive is detected at the dispatch sites (which see the raw response) and the directive string is stripped from user-visible cleaned text in extract_media, so it never leaks into the user's chat.

Scope

All-or-nothing per response (matches [[audio_as_voice]]'s scope). Skills that need fine control can split into two responses.
Patches three dispatch paths: BasePlatformAdapter._process_message_background, GatewayRunner._deliver_media_from_response, and send_message tool's _send_telegram / _send_to_platform. The first two fix the agent's normal response path; the third fixes when an agent calls send_message(target='telegram', message='[[as_document]]\n...') explicitly.
Telegram is the immediate beneficiary. Discord/Slack/Mattermost/Email send_multiple_images implementations don't recompress on upload, so they're unaffected; the directive is harmlessly honored on those platforms via the same _image_paths → _non_image_media swap.

Tests

+3 cases in TestExtractMedia:

directive stripped from cleaned text
directive doesn't entangle with [[audio_as_voice]]
both directives can coexist in one response

All 113 pre-existing media/extract/send tests still pass.

Verification

Tested locally with info-graph (https://github.com/leon7609/info-graph v0.2.2 emits the directive on every successful render). Before patch: 3840×2160 JPG → Telegram delivers compressed photo at ~1280px. After patch: same source → Telegram delivers original 1MB JPEG via sendDocument, fully legible.

Skills that produce large/lossless images (e.g. info-graph, where a rendered JPG is 1-2 MB) currently lose quality in Telegram delivery because `_IMAGE_EXTS` membership routes the file through `send_multiple_images` → `sendMediaGroup`, which Telegram's server re-encodes to JPEG @ 1280px max edge. The original bytes only survive when the file goes through `send_document`, which the dispatch tables in three places (`_process_message_background`, `_deliver_media_from_response`, and the `send_message` tool's telegram path) only reach for files whose extension is NOT in `_IMAGE_EXTS`. This commit adds an `[[as_document]]` directive that mirrors the existing `[[audio_as_voice]]` shape: a skill emits the directive once in its response, and every image-extension MEDIA: file in that response is delivered via `send_document` instead of `send_multiple_images` / `sendPhoto`. The directive is detected at the dispatch sites (which see the raw response) and the directive string is stripped from the user-visible cleaned text in `extract_media` so it never leaks. Granularity is intentionally all-or-nothing per response, matching [[audio_as_voice]]'s scope. Skills that need fine control can split into two responses. Verified the targeted use case: info-graph emits 信息图已生成（...） [[as_document]] MEDIA:/tmp/info-graph-x/infographic.jpg → Telegram receives `infographic.jpg` via sendDocument, original 1MB JPEG bytes preserved, no recompression. Forwarding and download filenames stay clean (`infographic.jpg`). Tests: +3 cases in TestExtractMedia covering directive strip, isolation from voice flag, and coexistence with [[audio_as_voice]]. All 113 pre-existing media/extract/send tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels May 3, 2026

This was referenced May 3, 2026

feat(gateway): support [[as_document]] directive for skill media routing #19068

Closed

feat(gateway): support [[as_document]] directive for skill media routing #19067

Closed

teknium1 mentioned this pull request May 7, 2026

feat(gateway): [[as_document]] directive for skill media routing (salvage #19069) #21210

Merged

teknium1 closed this in #21210 May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): support [[as_document]] directive for skill media routing#19069

feat(gateway): support [[as_document]] directive for skill media routing#19069
leon7609 wants to merge 1 commit into
NousResearch:mainfrom
leon7609:feat/as-document-directive

leon7609 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leon7609 commented May 3, 2026

Why

What

Scope

Tests

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants