Skip to content

feat(desktop): support base64 image upload for remote backends#39437

Closed
RyTsYdUp wants to merge 1 commit into
NousResearch:mainfrom
RyTsYdUp:fix/desktop-remote-image-upload
Closed

feat(desktop): support base64 image upload for remote backends#39437
RyTsYdUp wants to merge 1 commit into
NousResearch:mainfrom
RyTsYdUp:fix/desktop-remote-image-upload

Conversation

@RyTsYdUp

@RyTsYdUp RyTsYdUp commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Problem

When the desktop app connects to a remote Hermes backend (e.g. a Linux server), image attachments fail. The app saves clipboard/dropped images to a Windows-local temp path (e.g. C:\Users\...\tmp.png), then calls image.attach with that path. The remote Linux backend cannot read Windows filesystem paths and returns image not found.

Affects both clipboard paste (Ctrl+V) and file drag-and-drop when using HERMES_DESKTOP_REMOTE_URL.

Solution

tui_gateway/server.py

image.attach now accepts an optional data parameter (base64 data URL or raw base64 + ext). When provided, bytes are decoded and written to a server-side temp file. The existing path flow is unchanged — purely additive.

apps/desktop/src/app/session/hooks/use-prompt-actions.ts

syncImageAttachmentsForSubmit detects Windows-style paths before calling image.attach. When a Windows path is detected and previewUrl is available (already fetched as base64 for the thumbnail), it sends data instead of path. No new IPC or network calls introduced.

Testing

Tested on Windows 11 desktop app pointing at a remote Linux Hermes instance via HERMES_DESKTOP_REMOTE_URL. Clipboard paste and drag-and-drop now successfully deliver images to the agent.

When the desktop app is connected to a remote Hermes backend, image
files on the local machine can't be read by the remote Linux host.

- tui_gateway/server.py: image.attach now accepts an optional `data`
  param (base64 data URL or raw base64 + ext). The bytes are written
  to a temp file on the server so the agent can process the image.
- apps/desktop/.../use-prompt-actions.ts: syncImageAttachmentsForSubmit
  detects Windows-style paths (C:\...) and sends the attachment's
  previewUrl (already fetched as base64) as `data` instead of `path`.

Fixes clipboard paste and drag-and-drop when the desktop app runs on
Windows pointing at a remote Linux Hermes instance.
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have labels Jun 5, 2026
teknium1 added a commit that referenced this pull request Jun 7, 2026
…splay gateway images over the network

Desktop connected to a remote gateway can now attach images and PDFs and
display agent-written images. Previously the desktop passed a LOCAL file path
to image.attach; on a remote gateway that path doesn't exist, so the image was
silently dropped ("skipped unreadable path") and the vision model never saw it.
The reverse direction was also broken — images the agent wrote on the gateway
rendered as dead links in the remote client.

Gateway (tui_gateway/server.py):
- image.attach_bytes: base64 byte upload written into the gateway's own images
  dir and queued via the existing native-image-attach pipeline. Magic-byte
  extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap,
  structured error codes. Accepts content_base64/filename (canonical) and
  data/ext (older-desktop aliases).
- pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI
  and queues the pages as images; 50 MB / 25-page caps. Accepts host path or
  base64 upload.
- Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image)
  so the two methods and the existing image.attach don't duplicate logic.

Gateway (hermes_cli/web_server.py):
- GET /api/media: returns a gateway-local image as a base64 data URL so remote
  clients can display it. Auth-gated like every /api route, extension
  allowlist + size cap, AND confined to the gateway's own media roots
  (images/screenshots/cache, resolved symlink-safe) so an authed caller can't
  read image-extension files anywhere on disk.

Desktop (apps/desktop):
- syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the
  connection mode is 'remote'; the local fast path is unchanged.
- media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and
  markdown-text fetch images over /api/media in remote mode.

Consolidates the competing remote-media PRs (#38876, #40317, #21908, #39437)
into one coherent implementation, taking the strongest parts of each and adding
shared-helper cleanup plus the /api/media root-confinement hardening on top.
The per-profile gateway switching from #38876 is intentionally left out as a
separable feature. TUI file uploads (#40492) remain a separate surface.

Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop
media.remote unit tests; full tui_gateway + web_server suites green (472
passed); tsc -b clean; E2E verified the full attach→disk→queue and
gateway-path→data-URL display round-trip plus the out-of-root security block.

Co-authored-by: Max Mitcham <maxmitcham@mac.home>
Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com>
Co-authored-by: Chris Cook <ccook@nvms.com>
Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>
@teknium1

teknium1 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Merged via #41336 (commit 16786f3 on main).

The remote-media work from this cluster was consolidated into one coherent implementation — image.attach_bytes + pdf.attach over the tui_gateway, root-confined GET /api/media for display, and the desktop remote-mode wiring. Your contribution was cherry-picked with your authorship preserved via a Co-authored-by trailer on the merge commit.

Verified live end-to-end over the real dashboard stack (real /api/ws WebSocket + authenticated HTTP): attach → gateway disk → queue, gateway-path → data-URL display with byte-identical round-trip, plus the security blocks (403 out-of-root, 403 symlink-escape, 415 non-image, 404 missing, 401 bad-auth). Thanks for the work!

@teknium1 teknium1 closed this Jun 7, 2026
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…splay gateway images over the network

Desktop connected to a remote gateway can now attach images and PDFs and
display agent-written images. Previously the desktop passed a LOCAL file path
to image.attach; on a remote gateway that path doesn't exist, so the image was
silently dropped ("skipped unreadable path") and the vision model never saw it.
The reverse direction was also broken — images the agent wrote on the gateway
rendered as dead links in the remote client.

Gateway (tui_gateway/server.py):
- image.attach_bytes: base64 byte upload written into the gateway's own images
  dir and queued via the existing native-image-attach pipeline. Magic-byte
  extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap,
  structured error codes. Accepts content_base64/filename (canonical) and
  data/ext (older-desktop aliases).
- pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI
  and queues the pages as images; 50 MB / 25-page caps. Accepts host path or
  base64 upload.
- Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image)
  so the two methods and the existing image.attach don't duplicate logic.

Gateway (hermes_cli/web_server.py):
- GET /api/media: returns a gateway-local image as a base64 data URL so remote
  clients can display it. Auth-gated like every /api route, extension
  allowlist + size cap, AND confined to the gateway's own media roots
  (images/screenshots/cache, resolved symlink-safe) so an authed caller can't
  read image-extension files anywhere on disk.

Desktop (apps/desktop):
- syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the
  connection mode is 'remote'; the local fast path is unchanged.
- media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and
  markdown-text fetch images over /api/media in remote mode.

Consolidates the competing remote-media PRs (NousResearch#38876, NousResearch#40317, NousResearch#21908, NousResearch#39437)
into one coherent implementation, taking the strongest parts of each and adding
shared-helper cleanup plus the /api/media root-confinement hardening on top.
The per-profile gateway switching from NousResearch#38876 is intentionally left out as a
separable feature. TUI file uploads (NousResearch#40492) remain a separate surface.

Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop
media.remote unit tests; full tui_gateway + web_server suites green (472
passed); tsc -b clean; E2E verified the full attach→disk→queue and
gateway-path→data-URL display round-trip plus the out-of-root security block.

Co-authored-by: Max Mitcham <maxmitcham@mac.home>
Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com>
Co-authored-by: Chris Cook <ccook@nvms.com>
Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>
alt-glitch pushed a commit that referenced this pull request Jun 14, 2026
…splay gateway images over the network

Desktop connected to a remote gateway can now attach images and PDFs and
display agent-written images. Previously the desktop passed a LOCAL file path
to image.attach; on a remote gateway that path doesn't exist, so the image was
silently dropped ("skipped unreadable path") and the vision model never saw it.
The reverse direction was also broken — images the agent wrote on the gateway
rendered as dead links in the remote client.

Gateway (tui_gateway/server.py):
- image.attach_bytes: base64 byte upload written into the gateway's own images
  dir and queued via the existing native-image-attach pipeline. Magic-byte
  extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap,
  structured error codes. Accepts content_base64/filename (canonical) and
  data/ext (older-desktop aliases).
- pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI
  and queues the pages as images; 50 MB / 25-page caps. Accepts host path or
  base64 upload.
- Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image)
  so the two methods and the existing image.attach don't duplicate logic.

Gateway (hermes_cli/web_server.py):
- GET /api/media: returns a gateway-local image as a base64 data URL so remote
  clients can display it. Auth-gated like every /api route, extension
  allowlist + size cap, AND confined to the gateway's own media roots
  (images/screenshots/cache, resolved symlink-safe) so an authed caller can't
  read image-extension files anywhere on disk.

Desktop (apps/desktop):
- syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the
  connection mode is 'remote'; the local fast path is unchanged.
- media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and
  markdown-text fetch images over /api/media in remote mode.

Consolidates the competing remote-media PRs (#38876, #40317, #21908, #39437)
into one coherent implementation, taking the strongest parts of each and adding
shared-helper cleanup plus the /api/media root-confinement hardening on top.
The per-profile gateway switching from #38876 is intentionally left out as a
separable feature. TUI file uploads (#40492) remain a separate surface.

Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop
media.remote unit tests; full tui_gateway + web_server suites green (472
passed); tsc -b clean; E2E verified the full attach→disk→queue and
gateway-path→data-URL display round-trip plus the out-of-root security block.

Co-authored-by: Max Mitcham <maxmitcham@mac.home>
Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com>
Co-authored-by: Chris Cook <ccook@nvms.com>
Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants