Skip to content

fix(telegram): treat image documents (PNG/JPG/WebP/GIF) as images#16710

Closed
stabruriss wants to merge 5 commits into
NousResearch:mainfrom
stabruriss:stabruriss/tg-png-support
Closed

fix(telegram): treat image documents (PNG/JPG/WebP/GIF) as images#16710
stabruriss wants to merge 5 commits into
NousResearch:mainfrom
stabruriss:stabruriss/tg-png-support

Conversation

@stabruriss

Copy link
Copy Markdown

Summary

  • Telegram messages where the user uploads a PNG/JPG/WebP/GIF as a file (instead of as a photo) arrive as msg.document, not msg.photo. The document handler currently rejects them with Unsupported document type '.png' because the image extensions aren't in SUPPORTED_DOCUMENT_TYPES. The vision tool never sees the image, so the agent replies that it can't see the file.
  • Route image-typed documents through the same image-cache path used for native photos: download → cache_image_from_bytesMessageType.PHOTO_queue_media_group_event / _enqueue_photo_event. This means albums and bursts of mixed photo/image-document messages are also buffered correctly.
  • Re-implements the idea from previously-closed PR fix: treat Telegram image documents as images #13215.

Changes

  • gateway/platforms/base.py: add SUPPORTED_IMAGE_TYPES (.png/.jpg/.jpeg/.webp/.gif).
  • gateway/platforms/telegram.py: in the msg.document branch, after video-document handling, route image documents through the photo pipeline. MIME-type fallback when filename is missing. 20 MB size cap consistent with other document types.
  • tests/gateway/test_telegram_documents.py: 6 new tests — PNG, JPEG, WebP, MIME-only resolution, oversized rejection, and a mixed-album case where a native photo and a PNG-as-document get buffered into one event.

Test plan

  • `pytest tests/gateway/test_telegram_documents.py -q` — 42 passed (36 pre-existing + 6 new)
  • `python -m py_compile gateway/platforms/base.py gateway/platforms/telegram.py tests/gateway/test_telegram_documents.py`
  • Manual: send a PNG via Telegram with "send as file" checked; confirm the agent describes the image instead of replying "unsupported"

stabruriss and others added 5 commits April 20, 2026 14:59
The upstream Dockerfile sets USER hermes before ENTRYPOINT, which
prevents docker/entrypoint.sh from running its gosu-based chown block
when Railway mounts a root-owned Volume at /opt/data. Starting as root
lets the entrypoint fix ownership, then drop privileges via gosu.
The Telegram/Discord messaging gateway reads model.default only from
config.yaml (see gateway/run.py _resolve_gateway_model), so the
HERMES_MODEL env var has no effect on those gateways. Patch the
entrypoint to rewrite config.yaml's model.default from HERMES_MODEL
when set, making the env var the single source of truth in
container deployments.
PyYAML writes scalar strings without quotes by default, so the
previous regex (which required double-quoted values) silently no-op'd
on real config.yaml files. Switch to a yaml.safe_load/safe_dump round
trip — robust to either quoting style and to wholly missing model
sections.
When a user uploaded a PNG via Telegram with "send as file" (so it
arrives as msg.document, not msg.photo), the document handler rejected
it with "Unsupported document type '.png'" because PNG is not in
SUPPORTED_DOCUMENT_TYPES. The vision tool never saw the image.

Route image-typed documents through the same image cache path used for
native photos so the agent can actually see them. Album and burst
buffering work the same as native photos.

Re-implements the idea from closed PR NousResearch#13215.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter tool/vision Vision analysis and image generation labels Apr 27, 2026
@laydros

laydros commented May 2, 2026

Copy link
Copy Markdown

I hit this too and filed #18620 with repro details. This looks like the right PR for the fix.

@teknium1

Copy link
Copy Markdown
Contributor

This is already implemented on current main by the later Telegram image-document salvage work. Thanks for the report and patch; the prior discussion was useful, including @laydros confirming the same repro in #18620.

Automated hermes-sweeper review evidence:

  • gateway/platforms/base.py:1150 defines SUPPORTED_IMAGE_DOCUMENT_TYPES for .jpg, .jpeg, .png, .webp, and .gif.
  • gateway/platforms/telegram.py:5562 detects image documents by extension or image/* MIME, downloads the bytes, caches via cache_image_from_bytes, and sets MessageType.PHOTO.
  • gateway/platforms/telegram.py:5582 routes those image-document events through _queue_media_group_event / _enqueue_photo_event, matching the native photo batching path requested here.
  • tests/gateway/test_telegram_documents.py:264 covers a PNG sent as a Telegram document and asserts it is enqueued as a photo instead of going through the unsupported-document path.
  • The fix is present in commit 77c4675a50db7abbfd191d4fba4746b4f3e1559e (fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline), and the superseding salvage PR fix(telegram): route image documents (.png/.jpg/.webp/.gif) through vision pipeline #28519 was merged.

@teknium1 teknium1 closed this Jun 10, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter sweeper:implemented-on-main Sweeper: behavior already present on current main tool/vision Vision analysis and image generation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants