Skip to content

Feature: /upload Command — Ephemeral Tunnel-Based File Upload for Remote Users #532

@teknium1

Description

@teknium1

Overview

Users on messenger platforms (Telegram, Discord, WhatsApp, Slack) currently have limited options for getting files to Hermes Agent. Small files can be sent directly as platform attachments (20MB Telegram limit, 25MB Discord), but these land in ephemeral cache directories that auto-clean after 24 hours. There's no way to upload large files, bulk upload multiple documents, or send files from a device that isn't running the messenger client.

This proposal introduces a /upload command that spins up an ephemeral upload website via a tunneled connection, gives the user a URL, and accepts file uploads through a browser. The tunnel auto-shuts down after a configurable timeout (default 5 minutes). This complements the workspace feature (see companion issue) by providing a gateway for remote users to populate their workspace.

The approach: run a lightweight HTTP upload server on localhost, expose it through a zero-config tunnel (Cloudflare Quick Tunnel — free, no account needed), send the URL to the user, accept uploads, save to workspace, tear down. Simple, ephemeral, secure.

No existing AI agent platform offers this workflow. Most assume local filesystem access or rely on platform-native attachment limits. This would be a genuinely novel capability.


Research Findings

Current File Handling in Hermes Agent

Incoming files from platforms (already works):

  • Telegram: bot.get_file() → download to document_cache/. 20MB hard limit via Bot API. Text files (.md, .txt ≤100KB) have content injected into message text.
  • Discord: attachment.save() → download from CDN. 25MB limit (50-100MB for boosted servers).
  • WhatsApp: Two-step Graph API download (GET media URL → download with auth token).
  • Slack: files.info → download with auth header. Three-step upload process for sending.

Limitations:

  • All files go to ephemeral cache dirs (UUID-prefixed, flat, 24h auto-cleanup)
  • No bulk upload capability
  • No cross-device upload (can't send from laptop to Telegram bot on phone)
  • No way to bypass platform size limits
  • No persistent file organization

Tunneling Solutions Evaluated

Solution Config Required Account Needed HTTPS Max Upload Install Cost
cloudflared Zero No 100MB through CF binary Free
localtunnel Zero No Unlimited npm/npx Free
bore Zero No ✗ (TCP only) Unlimited cargo/brew Free
ngrok Auth token Yes (free tier) Unlimited binary Freemium

Recommendation: cloudflared Quick Tunnel as primary, localtunnel as fallback.

Cloudflare Quick Tunnel details:

# Zero-config usage — generates random trycloudflare.com subdomain
cloudflared tunnel --url http://localhost:8080
# Output: https://random-words.trycloudflare.com
  • No account, no signup, no API keys
  • HTTPS with valid TLS certificate (Cloudflare-issued)
  • Uses Cloudflare's global Anycast network (fast from anywhere)
  • 100MB upload limit through Cloudflare's edge (sufficient for docs)
  • 200 concurrent in-flight request limit
  • Intended for development/ephemeral use — perfect fit

localtunnel details:

npx localtunnel --port 8080
# Output: https://random.loca.lt
  • Zero config, no account
  • Node.js dependency (already needed for WhatsApp bridge)
  • Self-hostable server available
  • Less reliable than cloudflared but good fallback

How Other Platforms Handle File Upload

No other AI agent/bot framework offers tunnel-based file upload. The closest patterns:

  • ChatGPT: Direct file upload in web UI (drag & drop), stored server-side
  • Telegram Bot API: Platform-native attachment handling (20MB limit, no workaround without custom server)
  • Discord bots: CDN-hosted attachments, downloaded by bot
  • Ephemeral file sharing services: transfer.sh (defunct), 0x0.st, file.io — upload via curl, get download URL. These are intermediaries, not tunnel-based.

The tunnel approach is novel: instead of routing files through a third-party service, we expose the agent's own upload endpoint directly. Files never leave the user→agent path.


Current State in Hermes Agent

What we have:

What we don't have:

  • No /upload command
  • No tunnel infrastructure
  • No ephemeral web server capability
  • No file upload web UI
  • No way to bypass platform size limits

Implementation Plan

Core Architecture: Classification

This is a core codebase change + bundled skill hybrid:

  • The /upload command and tunnel management are core (need gateway command integration)
  • The upload web server could be a standalone Python module
  • The upload page HTML/JS is a static asset
  • cloudflared detection and fallback logic needs to be in the core

Upload Flow

User: /upload              (or /upload 30 for 30-minute window)
  │
  ├─ Agent starts HTTP upload server on random localhost port
  ├─ Agent launches cloudflared quick tunnel → gets public URL
  ├─ Agent generates single-use upload token
  ├─ Agent sends URL + token to user:
  │    "📤 Upload files here (link expires in 5 min):
  │     https://random-words.trycloudflare.com/u/abc123
  │     Max 100MB per file. Drag & drop or click to browse."
  │
  ├─ User opens URL in browser
  │    ├─ Clean upload page with drag-and-drop zone
  │    ├─ File type indicators (📄 PDF, 📊 Excel, etc.)
  │    ├─ Upload progress bar
  │    └─ "Upload complete!" confirmation
  │
  ├─ Files saved to ~/.hermes/workspace/uploads/
  ├─ Agent confirms in chat:
  │    "✅ Received 3 files:
  │     - quarterly-report.pdf (2.4 MB)
  │     - meeting-notes.md (12 KB)
  │     - data.csv (847 KB)
  │     Saved to workspace. I can read these now!"
  │
  └─ Tunnel + server shut down (timeout or /upload stop)

What We'd Need

  1. Upload HTTP server (tools/upload_server.py or gateway/upload.py) — lightweight async HTTP server (aiohttp or built-in http.server) with multipart file upload handling
  2. Upload web page — single-page HTML/JS with drag-and-drop, progress bars, file type validation. Embedded as a string constant or bundled asset.
  3. Tunnel manager — detect cloudflared availability, launch quick tunnel, parse URL from stdout, manage lifecycle, fallback to localtunnel
  4. /upload command — gateway slash command with optional timeout argument
  5. Upload token system — single-use or time-limited tokens for auth (prevent random internet users from uploading)
  6. File validation — size limits, type allowlists, malware scanning (ClamAV if available)
  7. Workspace integration — save uploaded files to ~/.hermes/workspace/uploads/, update manifest

Security Model

1. Upload token: random 32-char token, single-use or time-limited
2. URL structure: https://tunnel-url/u/{token} — token required for access
3. Size limit: 100MB per file (configurable), 500MB total per session
4. File type allowlist: documents, code, data, images, archives
   Blocked: executables (.exe, .sh, .bat), system files
5. Rate limiting: max 20 files per upload session
6. Auto-shutdown: tunnel killed after timeout (default 5 min)
7. No directory traversal: filenames sanitized, saved to fixed directory
8. HTTPS only: cloudflared provides TLS (never expose plain HTTP)

Phased Rollout

Phase 1: Basic Upload with Cloudflared

  • /upload [minutes] command in gateway (all platforms) and CLI
  • Minimal HTTP upload server (Python stdlib http.server + multipart parsing)
  • Simple but functional upload page (HTML form + JS progress)
  • cloudflared quick tunnel (detect binary, launch, parse URL)
  • Single-use token authentication
  • Files saved to ~/.hermes/workspace/uploads/ (or ~/.hermes/uploads/ if workspace not yet implemented)
  • Auto-shutdown after timeout
  • /upload stop to manually tear down

Phase 2: Enhanced UX + Fallback

  • Drag-and-drop upload page with file type icons, batch progress
  • Fallback tunnel chain: cloudflared → localtunnel → bore → direct URL (for LAN)
  • Platform-native "save to workspace" button for direct message attachments
  • Upload history and file management (/uploads list, /uploads delete)
  • Webhook notification to chat when upload completes (real-time confirmation)
  • Configurable: default timeout, max file size, allowed types in config.yaml

Phase 3: Smart Upload + Integration


Pros & Cons

Pros

Cons / Risks

  • cloudflared dependency: Needs the cloudflared binary installed. Mitigated by fallback chain and clear error messages ("install cloudflared: brew install cloudflared")
  • Network exposure: Even with token auth, exposing a local server to the internet has inherent risk. Mitigated by short timeout, token validation, file type restrictions, size limits.
  • Cloudflare Quick Tunnel reliability: No SLA, intended for development. For a 5-minute upload window this is fine; for long-running sessions, less reliable.
  • Firewall/NAT issues: Some corporate networks block outbound tunnel connections. Fallback to direct platform upload for these cases.
  • Complexity: Adding an HTTP server + tunnel management + upload UI is non-trivial. Phase 1 should be minimal to prove the concept.
  • No upload from CLI: CLI users don't need tunnels — they have direct filesystem access. The /upload command should detect CLI and just print the workspace path.

Open Questions

  1. cloudflared vs localtunnel as primary: cloudflared is more reliable and faster but requires a binary install. localtunnel is npm-based (already a dependency for WhatsApp). Recommendation: cloudflared primary, localtunnel fallback.

  2. Upload page hosting: Embed HTML as a Python string constant, or serve from a static file? Recommendation: embed as constant for zero-dependency deployment, with option to override via ~/.hermes/upload-page.html.

  3. Automatic vs manual tunnel: Should the agent auto-detect "user wants to send me a file" and offer upload? Or only on explicit /upload command? Recommendation: explicit command only in Phase 1; smart detection ("I have a PDF to share") in Phase 3.

  4. Integration with feat: File transfer between sandboxed environments and users (send_file tool) #466 (send_file): Issue feat: File transfer between sandboxed environments and users (send_file tool) #466 proposes presigned URLs for large file transfers. Should upload use the same mechanism? Recommendation: different mechanisms for different directions. /upload uses tunnels (user→agent), send_file uses presigned URLs or platform APIs (agent→user).

  5. CLI behavior: What should /upload do in CLI mode? Options: (a) print workspace path, (b) open file picker dialog, (c) start local server without tunnel (for LAN access). Recommendation: print workspace path + optional local server for remote CLI sessions.


References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions