fix(telegram): treat image documents (PNG/JPG/WebP/GIF) as images#16710
Closed
stabruriss wants to merge 5 commits into
Closed
fix(telegram): treat image documents (PNG/JPG/WebP/GIF) as images#16710stabruriss wants to merge 5 commits into
stabruriss wants to merge 5 commits into
Conversation
The upstream Dockerfile sets USER hermes before ENTRYPOINT, which prevents docker/entrypoint.sh from running its gosu-based chown block when Railway mounts a root-owned Volume at /opt/data. Starting as root lets the entrypoint fix ownership, then drop privileges via gosu.
The Telegram/Discord messaging gateway reads model.default only from config.yaml (see gateway/run.py _resolve_gateway_model), so the HERMES_MODEL env var has no effect on those gateways. Patch the entrypoint to rewrite config.yaml's model.default from HERMES_MODEL when set, making the env var the single source of truth in container deployments.
PyYAML writes scalar strings without quotes by default, so the previous regex (which required double-quoted values) silently no-op'd on real config.yaml files. Switch to a yaml.safe_load/safe_dump round trip — robust to either quoting style and to wholly missing model sections.
When a user uploaded a PNG via Telegram with "send as file" (so it arrives as msg.document, not msg.photo), the document handler rejected it with "Unsupported document type '.png'" because PNG is not in SUPPORTED_DOCUMENT_TYPES. The vision tool never saw the image. Route image-typed documents through the same image cache path used for native photos so the agent can actually see them. Album and burst buffering work the same as native photos. Re-implements the idea from closed PR NousResearch#13215. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 1, 2026
|
I hit this too and filed #18620 with repro details. This looks like the right PR for the fix. |
This was referenced May 2, 2026
Contributor
|
This is already implemented on current Automated hermes-sweeper review evidence:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
msg.document, notmsg.photo. The document handler currently rejects them withUnsupported document type '.png'because the image extensions aren't inSUPPORTED_DOCUMENT_TYPES. The vision tool never sees the image, so the agent replies that it can't see the file.cache_image_from_bytes→MessageType.PHOTO→_queue_media_group_event/_enqueue_photo_event. This means albums and bursts of mixed photo/image-document messages are also buffered correctly.Changes
gateway/platforms/base.py: addSUPPORTED_IMAGE_TYPES(.png/.jpg/.jpeg/.webp/.gif).gateway/platforms/telegram.py: in themsg.documentbranch, after video-document handling, route image documents through the photo pipeline. MIME-type fallback when filename is missing. 20 MB size cap consistent with other document types.tests/gateway/test_telegram_documents.py: 6 new tests — PNG, JPEG, WebP, MIME-only resolution, oversized rejection, and a mixed-album case where a native photo and a PNG-as-document get buffered into one event.Test plan