Skip to content

feat(tui_gateway): add image.attach_bytes and pdf.attach methods#21908

Closed
ccook1963 wants to merge 1 commit into
NousResearch:mainfrom
ccook1963:feat/tui-gateway-attach-bytes-and-pdf-attach
Closed

feat(tui_gateway): add image.attach_bytes and pdf.attach methods#21908
ccook1963 wants to merge 1 commit into
NousResearch:mainfrom
ccook1963:feat/tui-gateway-attach-bytes-and-pdf-attach

Conversation

@ccook1963

Copy link
Copy Markdown
Contributor

Summary

Adds two JSON-RPC methods to the TUI gateway so clients (web dashboards, mobile apps, etc.) can attach images and PDFs to a session in a single round-trip without first transferring the file onto the host filesystem via SCP/SFTP.

  • image.attach_bytes — base64 image upload
  • pdf.attach — PDF → per-page PNGs via pdftoppm, queued as image attachments

Both methods reuse the existing _image_meta(), _IMAGE_EXTENSIONS allowlist, _hermes_home paths, and image_counter logic. The next prompt.submit picks the queued attachments up via the existing native-image-attach pipeline — no changes to attachment delivery downstream.

Why

Today, the only way for a remote client to attach an image to a Hermes session is image.attach, which expects a path on the host filesystem. That requires SCP/SFTP first. For a web dashboard the round-trip is awkward — every user upload becomes POST → server-side SCP → JSON-RPC image.attach. Hermie's dashboard hit this and we ended up writing a local patch.

PDFs have a related gap: Anthropic's vision pipeline accepts images, not PDFs. A user can't drop a PDF into chat and have Claude read it without page-rendering server-side. pdf.attach does that with pdftoppm (poppler-utils), 150 DPI per page, which is readable for vision without being absurdly large.

Method shapes

image.attach_bytes

{
  "method": "image.attach_bytes",
  "params": {
    "session_id": "<sid>",
    "content_base64": "iVBORw0KGgo...",        // raw OR data:image/png;base64,...
    "filename": "screenshot.png"                 // optional, drives extension
  }
}
// → { attached, path, count, remainder, text, bytes, name, width, height, token_estimate }
  • 25 MB cap (matches Anthropic image limits)
  • Strips data:image/...;base64, prefix and embedded whitespace
  • Magic-byte sniffing for PNG / JPEG / GIF / WebP / BMP if no filename
  • Defense-in-depth extension allowlist (writes to ~/.hermes/images/)

pdf.attach

{
  "method": "pdf.attach",
  "params": {
    "session_id": "<sid>",
    "path": "/path/to.pdf",                      // OR
    "content_base64": "JVBERi0xLjQK...",         // OR data:application/pdf;base64,...
    "filename": "report.pdf",                    // optional, for display
    "first_page": 1,                             // optional
    "last_page": 10                              // optional, default = first + 24
  }
}
// → { attached, filename, pages_attached, pages: [{path, page, name, width, height, token_estimate}, ...], count, text }
  • 50 MB PDF cap, 25 pages per call cap
  • Validates %PDF- magic bytes for base64 input
  • Returns a clean 5028 error if pdftoppm is not on PATH
  • Each page auto-queues into attached_images so prompt.submit picks them up

Tests

7 new pytest cases in tests/test_tui_gateway_server.py:

  • test_image_attach_bytes_accepts_raw_base64
  • test_image_attach_bytes_strips_data_url_prefix
  • test_image_attach_bytes_rejects_invalid_base64 (4017)
  • test_image_attach_bytes_rejects_oversized (4018)
  • test_pdf_attach_renders_pages_when_pdftoppm_available (skips if poppler not installed)
  • test_pdf_attach_rejects_non_pdf_payload
  • test_pdf_attach_rejects_oversized_page_range (4019)

All 9 image_attach/pdf_attach tests pass on Python 3.12 with poppler-utils 24.02.0:

9 passed in 4.76s

Out-of-band testing

This patch has been running on a 5-VPS production fleet for ~1 hour as of this PR. No regressions observed.

Dependencies

  • pdf.attach requires pdftoppm (apt install poppler-utils on Debian/Ubuntu, brew install poppler on macOS)
  • No new Python dependencies — uses only stdlib (base64, re, subprocess, tempfile, shutil)
  • No changes to AGENTS.md slash-command registry, gateway hooks, or platform adapters

Error codes

Reuses existing convention from image.attach and clipboard.paste. New codes:

  • 4017 — invalid base64 / not a PDF
  • 4018 — payload exceeds size cap
  • 4019 — page range exceeds cap
  • 5028pdftoppm not installed / failed / timed out

Adds two JSON-RPC methods so clients (e.g. dashboards) can attach images
and PDFs to a session in a single round-trip, without first transferring
the file onto the host filesystem via SCP/SFTP.

image.attach_bytes:
  Accepts base64-encoded image bytes (with optional data: URL prefix or
  filename hint), writes to ~/.hermes/images/, queues into the session's
  attached_images list. 25 MB cap, magic-byte sniffing for PNG/JPEG/GIF/
  WebP/BMP, defense-in-depth extension allowlist. Same response shape as
  image.attach.

pdf.attach:
  Accepts an on-disk PDF path or base64 PDF bytes, runs pdftoppm at 150
  DPI to render each page to PNG, queues each page-image into
  attached_images. Anthropic's vision pipeline accepts images, not PDFs,
  so this fills the gap for clients that want 'drop a PDF into a chat'
  semantics. 50 MB PDF cap, 25 pages per call cap. Validates %PDF- magic
  bytes for base64 input.

Both methods reuse the existing _image_meta(), _IMAGE_EXTENSIONS allow-
list, _hermes_home paths, and image_counter logic — so existing prompt.
submit native-attach handling picks them up unchanged.

Tests: 7 new pytest cases in tests/test_tui_gateway_server.py covering
empty input, invalid base64, oversized payload, magic-byte sniffing,
data URL prefix, on-disk PDF rendering, non-PDF rejection, page-range
cap. pdf.attach tests skip cleanly when pdftoppm is not installed.
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/tui Terminal UI (ui-tui/ + tui_gateway/) labels May 11, 2026
teknium1 added a commit that referenced this pull request Jun 7, 2026
…splay gateway images over the network

Desktop connected to a remote gateway can now attach images and PDFs and
display agent-written images. Previously the desktop passed a LOCAL file path
to image.attach; on a remote gateway that path doesn't exist, so the image was
silently dropped ("skipped unreadable path") and the vision model never saw it.
The reverse direction was also broken — images the agent wrote on the gateway
rendered as dead links in the remote client.

Gateway (tui_gateway/server.py):
- image.attach_bytes: base64 byte upload written into the gateway's own images
  dir and queued via the existing native-image-attach pipeline. Magic-byte
  extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap,
  structured error codes. Accepts content_base64/filename (canonical) and
  data/ext (older-desktop aliases).
- pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI
  and queues the pages as images; 50 MB / 25-page caps. Accepts host path or
  base64 upload.
- Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image)
  so the two methods and the existing image.attach don't duplicate logic.

Gateway (hermes_cli/web_server.py):
- GET /api/media: returns a gateway-local image as a base64 data URL so remote
  clients can display it. Auth-gated like every /api route, extension
  allowlist + size cap, AND confined to the gateway's own media roots
  (images/screenshots/cache, resolved symlink-safe) so an authed caller can't
  read image-extension files anywhere on disk.

Desktop (apps/desktop):
- syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the
  connection mode is 'remote'; the local fast path is unchanged.
- media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and
  markdown-text fetch images over /api/media in remote mode.

Consolidates the competing remote-media PRs (#38876, #40317, #21908, #39437)
into one coherent implementation, taking the strongest parts of each and adding
shared-helper cleanup plus the /api/media root-confinement hardening on top.
The per-profile gateway switching from #38876 is intentionally left out as a
separable feature. TUI file uploads (#40492) remain a separate surface.

Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop
media.remote unit tests; full tui_gateway + web_server suites green (472
passed); tsc -b clean; E2E verified the full attach→disk→queue and
gateway-path→data-URL display round-trip plus the out-of-root security block.

Co-authored-by: Max Mitcham <maxmitcham@mac.home>
Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com>
Co-authored-by: Chris Cook <ccook@nvms.com>
Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>
@teknium1

teknium1 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Merged via #41336 (commit 16786f3 on main).

The remote-media work from this cluster was consolidated into one coherent implementation — image.attach_bytes + pdf.attach over the tui_gateway, root-confined GET /api/media for display, and the desktop remote-mode wiring. Your contribution was cherry-picked with your authorship preserved via a Co-authored-by trailer on the merge commit.

Verified live end-to-end over the real dashboard stack (real /api/ws WebSocket + authenticated HTTP): attach → gateway disk → queue, gateway-path → data-URL display with byte-identical round-trip, plus the security blocks (403 out-of-root, 403 symlink-escape, 415 non-image, 404 missing, 401 bad-auth). Thanks for the work!

@teknium1 teknium1 closed this Jun 7, 2026
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…splay gateway images over the network

Desktop connected to a remote gateway can now attach images and PDFs and
display agent-written images. Previously the desktop passed a LOCAL file path
to image.attach; on a remote gateway that path doesn't exist, so the image was
silently dropped ("skipped unreadable path") and the vision model never saw it.
The reverse direction was also broken — images the agent wrote on the gateway
rendered as dead links in the remote client.

Gateway (tui_gateway/server.py):
- image.attach_bytes: base64 byte upload written into the gateway's own images
  dir and queued via the existing native-image-attach pipeline. Magic-byte
  extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap,
  structured error codes. Accepts content_base64/filename (canonical) and
  data/ext (older-desktop aliases).
- pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI
  and queues the pages as images; 50 MB / 25-page caps. Accepts host path or
  base64 upload.
- Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image)
  so the two methods and the existing image.attach don't duplicate logic.

Gateway (hermes_cli/web_server.py):
- GET /api/media: returns a gateway-local image as a base64 data URL so remote
  clients can display it. Auth-gated like every /api route, extension
  allowlist + size cap, AND confined to the gateway's own media roots
  (images/screenshots/cache, resolved symlink-safe) so an authed caller can't
  read image-extension files anywhere on disk.

Desktop (apps/desktop):
- syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the
  connection mode is 'remote'; the local fast path is unchanged.
- media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and
  markdown-text fetch images over /api/media in remote mode.

Consolidates the competing remote-media PRs (NousResearch#38876, NousResearch#40317, NousResearch#21908, NousResearch#39437)
into one coherent implementation, taking the strongest parts of each and adding
shared-helper cleanup plus the /api/media root-confinement hardening on top.
The per-profile gateway switching from NousResearch#38876 is intentionally left out as a
separable feature. TUI file uploads (NousResearch#40492) remain a separate surface.

Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop
media.remote unit tests; full tui_gateway + web_server suites green (472
passed); tsc -b clean; E2E verified the full attach→disk→queue and
gateway-path→data-URL display round-trip plus the out-of-root security block.

Co-authored-by: Max Mitcham <maxmitcham@mac.home>
Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com>
Co-authored-by: Chris Cook <ccook@nvms.com>
Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>
alt-glitch pushed a commit that referenced this pull request Jun 14, 2026
…splay gateway images over the network

Desktop connected to a remote gateway can now attach images and PDFs and
display agent-written images. Previously the desktop passed a LOCAL file path
to image.attach; on a remote gateway that path doesn't exist, so the image was
silently dropped ("skipped unreadable path") and the vision model never saw it.
The reverse direction was also broken — images the agent wrote on the gateway
rendered as dead links in the remote client.

Gateway (tui_gateway/server.py):
- image.attach_bytes: base64 byte upload written into the gateway's own images
  dir and queued via the existing native-image-attach pipeline. Magic-byte
  extension sniffing, data-URL prefix + whitespace tolerance, 25 MB cap,
  structured error codes. Accepts content_base64/filename (canonical) and
  data/ext (older-desktop aliases).
- pdf.attach: renders each page to PNG via pdftoppm (poppler-utils) at 150 DPI
  and queues the pages as images; 50 MB / 25-page caps. Accepts host path or
  base64 upload.
- Shared helpers (_decode_attach_base64, _sniff_image_ext, _queue_attached_image)
  so the two methods and the existing image.attach don't duplicate logic.

Gateway (hermes_cli/web_server.py):
- GET /api/media: returns a gateway-local image as a base64 data URL so remote
  clients can display it. Auth-gated like every /api route, extension
  allowlist + size cap, AND confined to the gateway's own media roots
  (images/screenshots/cache, resolved symlink-safe) so an authed caller can't
  read image-extension files anywhere on disk.

Desktop (apps/desktop):
- syncImageAttachmentsForSubmit uploads bytes via image.attach_bytes when the
  connection mode is 'remote'; the local fast path is unchanged.
- media.ts gains isRemoteGateway() + gatewayMediaDataUrl(); directive-text and
  markdown-text fetch images over /api/media in remote mode.

Consolidates the competing remote-media PRs (#38876, #40317, #21908, #39437)
into one coherent implementation, taking the strongest parts of each and adding
shared-helper cleanup plus the /api/media root-confinement hardening on top.
The per-profile gateway switching from #38876 is intentionally left out as a
separable feature. TUI file uploads (#40492) remain a separate surface.

Tested: 11 new tui_gateway tests + 5 /api/media endpoint tests + desktop
media.remote unit tests; full tui_gateway + web_server suites green (472
passed); tsc -b clean; E2E verified the full attach→disk→queue and
gateway-path→data-URL display round-trip plus the out-of-root security block.

Co-authored-by: Max Mitcham <maxmitcham@mac.home>
Co-authored-by: Justlrnal4 <Justlrnal4@users.noreply.github.com>
Co-authored-by: Chris Cook <ccook@nvms.com>
Co-authored-by: Thomas Paquette <thomas.paquette@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tui Terminal UI (ui-tui/ + tui_gateway/) P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants