Summary
image_generate tool returns xAI ephemeral tmp URLs (https://imgen.x.ai/xai-tmp-imgen-*.jpeg) that 404 by the time Telegram's send_photo fetches them. Three-tier delivery fallback (media_group → URL send_photo → file-upload send_photo) all fail with 404 Not Found. Same pattern likely affects video_gen for URLs that ARE persistent (vidgen.x.ai/xai-vidgen-bucket/) but where the LLM emits markdown  instead of using a media-routing API shape.
Reproduction
- Profile on
xai-oauth provider + grok-4.3 model with image_gen.provider: xai config.
- Send via Telegram: "Mach mir ein Bild von einem Frühlingsmorgen im Garten."
- Gateway log shows:
tool image_generate completed (5.60s, 556 chars)
[Telegram] Sending media group of 1 photo(s) (chunk 1/1)
[Telegram] Sending image: https://imgen.x.ai/.../xai-tmp-imgen-29c86b54-9557-4bb8-95df-ef8a3fc2fa05.jpeg (alt=Frühlingsmorgen im Garten)
WARNING [Telegram] send_media_group failed (chunk 1/1), falling back to per-image: ... "webpage_curl_failed"
WARNING [Telegram] URL-based send_photo failed, trying file upload: Wrong type of the web page content
ERROR [Telegram] File upload send_photo also failed: Client error '404 Not Found' for url 'https://imgen.x.ai/xai-tmp-imgen-*.jpeg'
Proposed fix
Mirror the existing audio_cache pattern used for text_to_speech (which returns a local file_path like ~/.hermes/profiles/<p>/audio_cache/tts_*.ogg):
- At
image_generate tool-completion time, download the URL bytes synchronously.
- Store at
~/.hermes/profiles/<p>/image_cache/imgen_<iso>.<ext> (chmod 600 for parity with audio cache).
- Replace the URL in the tool result payload with the local path (
file_path field) + optionally keep the original URL as source_url for traceability.
- Telegram adapter then uses local-file upload path which already works (
audio_cache voice bubbles work end-to-end).
For video_gen URLs that ARE persistent on vidgen.x.ai/xai-vidgen-bucket/, the issue is different: the LLM is emitting  markdown image syntax which Telegram doesn't render as video. A video_cache or just a normalized return shape with a clearer routing hint (e.g., explicit media_type: "video" field + delivery via send_video glue) would solve this.
Why this matters
Without the cache, every image_gen on a Telegram-routed profile delivers no image — only the text fallback. Functionally indistinguishable from a complete failure to end users. The TTS path works precisely because of the audio_cache step; the symmetric step is missing for image_gen.
Hermes version + commit
- Hermes Agent v0.13.0 (2026.5.7)
- Install commit
5f91b1a48b06c8260dc539614abda27cf4e831cb (post-hermes update to 374dc81c2359a6f61e8d1efc49de29d61d7b9a88)
- Python 3.11.14, mcp 1.26.0, python-telegram-bot 22.7
- Install-script-managed at
~/.hermes/hermes-agent/
Discovered while running a Pass B migration battery (xAI provider switch + tool enablement). Detailed evidence + workaround in our migration log if useful for repro / triage.
Summary
image_generatetool returns xAI ephemeral tmp URLs (https://imgen.x.ai/xai-tmp-imgen-*.jpeg) that 404 by the time Telegram'ssend_photofetches them. Three-tier delivery fallback (media_group → URL send_photo → file-upload send_photo) all fail with404 Not Found. Same pattern likely affects video_gen for URLs that ARE persistent (vidgen.x.ai/xai-vidgen-bucket/) but where the LLM emits markdowninstead of using a media-routing API shape.Reproduction
xai-oauthprovider +grok-4.3model withimage_gen.provider: xaiconfig.Proposed fix
Mirror the existing audio_cache pattern used for
text_to_speech(which returns a localfile_pathlike~/.hermes/profiles/<p>/audio_cache/tts_*.ogg):image_generatetool-completion time, download the URL bytes synchronously.~/.hermes/profiles/<p>/image_cache/imgen_<iso>.<ext>(chmod 600 for parity with audio cache).file_pathfield) + optionally keep the original URL assource_urlfor traceability.audio_cachevoice bubbles work end-to-end).For video_gen URLs that ARE persistent on
vidgen.x.ai/xai-vidgen-bucket/, the issue is different: the LLM is emittingmarkdown image syntax which Telegram doesn't render as video. Avideo_cacheor just a normalized return shape with a clearer routing hint (e.g., explicitmedia_type: "video"field + delivery viasend_videoglue) would solve this.Why this matters
Without the cache, every image_gen on a Telegram-routed profile delivers no image — only the text fallback. Functionally indistinguishable from a complete failure to end users. The TTS path works precisely because of the
audio_cachestep; the symmetric step is missing for image_gen.Hermes version + commit
5f91b1a48b06c8260dc539614abda27cf4e831cb(post-hermes updateto374dc81c2359a6f61e8d1efc49de29d61d7b9a88)~/.hermes/hermes-agent/Discovered while running a Pass B migration battery (xAI provider switch + tool enablement). Detailed evidence + workaround in our migration log if useful for repro / triage.