Skip to content

feat: File transfer between sandboxed environments and users (send_file tool) #466

@teknium1

Description

@teknium1

Problem

When Hermes runs in a sandboxed terminal backend (Docker, SSH, Modal, Singularity), the agent creates files inside the sandbox but has no way to send them to the user. Similarly, when users send file attachments on messaging platforms, those files are not injected into the sandbox.

This is the #1 missing capability for production sandboxed deployments.


What We Already Have

The foundational pieces exist but are not connected:

1. Base64 File Transfer (environments/tool_context.py)

Working upload_file() / download_file() with chunked base64 piping between host and sandbox. Currently only used by RL ToolContext — NOT exposed as an agent tool.

2. MEDIA: Tag System (gateway/platforms/base.py)

Agent includes MEDIA:/path/to/file in response text. BasePlatformAdapter.extract_media() detects these tags and routes files by extension (images → send_image, audio → send_voice, documents → send_document). Only works for host-local files.

3. Platform Send Methods

Platform send_document send_image_file send_video Status
Telegram ❌ MISSING ❌ MISSING ❌ MISSING Falls back to "📎 File: /path" text
Discord ❌ MISSING ❌ MISSING ❌ MISSING Same fallback
Slack ❌ MISSING ❌ MISSING ❌ MISSING Same fallback
WhatsApp Full support via bridge

Platform adapters are the #1 blocker. Even if send_file existed today, files can't be delivered on 3/4 messaging platforms.


Research

Investigated 50+ sources across cloud IDEs, container orchestration, CI/CD artifact systems, AI APIs, notebook environments, novel transfer protocols, and agent codebases.

How Other Agents Handle This

Agent/Platform Approach
OpenHands Workspace SDK with file_upload()/file_download(), Docker cp, download workspace as ZIP
Codex / Claude Code OS-level sandboxing on local FS — avoids the problem entirely
Cursor Cloud Git as file transfer protocol (push to branch, pull from GitHub)
E2B sandbox.files.read()/write() + pre-signed URLs — gold standard
Composio Schema-annotated file_uploadable/file_downloadable + S3 intermediary
Modal Volume sync, sb.open() Filesystem API, CloudBucketMounts
Coder REST API with tar/zip upload to content-addressed store + Mutagen sync
Pi-Mono Runtime bridge with returnDownloadableFile() (web-specific)
Jupyter Contents REST API (base64-in-JSON) + /files/ static serving
Colab files.download() triggers browser download + Drive mount for persistence

How AI APIs Handle Files

API Pattern
OpenAI Responses Container → file_idclient.containers.files.content(cntr, file_id)
Anthropic Claude Files API → file_id reference → download for tool-created files only
Google Gemini Upload → file_id → 48-hour auto-expiry → no download (input only)
E2B Direct SDK read/write + sandbox.downloadUrl() with presigned URLs

Container File Transfer Patterns

Method Mechanism Pros Cons Best For
Docker Archive API get_archive/put_archive (tar stream) Zero base64 overhead, binary-safe, no container deps Requires Docker socket Docker sandboxes
docker cp Wraps Archive API Simple CLI Same as above Quick transfers
SCP/SFTP Direct file transfer over SSH Fast, resumable, proven Requires SSH setup SSH sandboxes
Base64 over exec base64 < file | decode Works on ANY exec channel 33% overhead, slow, unreliable for large binary Universal fallback
kubectl cp tar stream over exec API Built into k8s Requires tar in container Kubernetes
Volume mounts Shared filesystem Zero-copy, real-time Must configure at start Known paths
Presigned URLs Upload to S3/storage, share URL Secure, scales, works in browsers Requires object storage Large files, messaging
transfer.sh Self-hosted HTTP file hosting Dead simple, one curl to upload Requires HTTP outbound Ephemeral sharing

Key Insight: Docker Archive API > Base64

The current base64-over-exec approach is the worst performing option. Docker's Archive API is what docker cp uses internally:

# Download: tar stream, no base64 overhead, handles binary perfectly
bits, stat = container.get_archive("/workspace/report.pdf")

# Upload: accepts tar archive bytes  
container.put_archive("/workspace/", tar_data)

Key Insight: Compression Before Base64

Inspired by the Kitty terminal file transfer protocol: gzip before base64 reduces payload 50-80% for text files. This should be the default for the base64 fallback path.

Messaging Platform File Limits

Platform Upload Limit Notes
Telegram 50 MB Self-hosted Bot API server removes limit (up to 2 GB)
Discord 25 MB 500 MB with Nitro
Slack 1 GB Paid plans
WhatsApp 100 MB Documents; 16 MB for images

For files exceeding these limits → upload to temp storage, send download URL.


Proposed Solution

Architecture

┌─────────────────────────────────────────────────────┐
│                   send_file Tool                     │
│  send_file(path, [message])                         │
└──────────────────────┬──────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────┐
│          File Transfer Layer (per-backend)           │
│  Docker:      get_archive API (tar stream)          │
│  SSH:         SCP/SFTP via ControlMaster connection  │
│  Modal:       sb.open() Filesystem API               │
│  Singularity: bind mount or gzip+base64 over exec   │
│  Local:       direct filesystem (no transfer)        │
└──────────────────────┬──────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────┐
│          Delivery Layer (per-frontend)               │
│  CLI:       copy to CWD, print path                 │
│  Telegram:  bot.send_document() (< 50 MB)           │
│  Discord:   discord.File() (< 25 MB)                │
│  Slack:     files_upload_v2() (< 1 GB)              │
│  WhatsApp:  send_document() via bridge              │
│  Fallback:  presigned URL for oversized files       │
└─────────────────────────────────────────────────────┘

The send_file Tool

{
    "name": "send_file",
    "description": "Send a file from the terminal environment to the user. Works across all backends and platforms.",
    "parameters": {
        "path": {
            "type": "string",
            "description": "Path to the file inside the terminal environment"
        },
        "message": {
            "type": "string", 
            "description": "Optional caption/message to send with the file"
        }
    }
}

Flow:

  1. Agent calls send_file(path="/workspace/report.pdf")
  2. Tool checks file exists + gets size via terminal
  3. Local: file already on host, get absolute path
  4. Docker: extract via get_archive API (tar stream, no base64)
  5. SSH: extract via SCP over existing ControlMaster
  6. Modal/Singularity: extract via gzip+base64 over exec
  7. Save to ~/.hermes/file_cache/<uuid>_<filename>
  8. CLI: copy to user CWD
  9. Gateway: return MEDIA:<host_path> → platform sends via send_document()

User Upload → Sandbox (Receive Side)

  1. Gateway downloads user's file attachment → ~/.hermes/file_cache/
  2. Inject into sandbox via reverse of send_file transport
  3. Add context to conversation: [User uploaded: report.csv (42 KB) at /workspace/uploads/report.csv]

Tiered Transfer Strategy

Tier Size Strategy
1 < 1 MB gzip + base64 over exec (any backend)
2 1-25 MB Docker: get_archive. SSH: scp. Modal: base64.
3 25-50 MB Extract to host → send via platform API
4 > 50 MB Extract to host → upload to temp storage → send presigned URL

Security

  • File cache auto-cleanup (TTL-based, default 1 hour)
  • Path traversal validation (realpath + prefix check)
  • Configurable max file size (default 50 MB)
  • MIME type detection via magic bytes (not extension)
  • UUID-based cache filenames (original name in metadata only)
  • Content-addressed storage (SHA-256) for dedup + integrity

Implementation Plan

Phase 0: Platform Adapter Gaps (PREREQUISITE) — ~2h

Fix the #1 blocker. Add missing methods to platform adapters:

Files to modify:

  • gateway/platforms/telegram.py — Add send_document() via bot.send_document()
  • gateway/platforms/discord.py — Add send_document() via discord.File()
  • gateway/platforms/slack.py — Add send_document() via files_upload_v2()
  • Add send_image_file() and send_video() overrides to all three

Phase 1: send_file Tool (MVP) — ~4h

Files to create:

  • tools/send_file_tool.py (~150 lines) — The tool itself

Files to modify:

  • tools/environments/base.py — Add download_file() to BaseEnvironment
  • tools/environments/docker.py — Implement via Docker Archive API
  • tools/environments/ssh.py — Implement via SCP/SFTP
  • model_tools.py — Register the new tool
  • toolsets.py — Add to appropriate toolsets (file toolset)

Phase 2: User Upload → Sandbox — ~4h

Files to modify:

  • gateway/run.py — Detect user file attachments, download to cache
  • tools/environments/base.py — Add upload_file() to BaseEnvironment
  • Conversation injection — Add file context to user message

Phase 3: Large File Handling — ~4h (optional)

  • Content-addressed file cache (~/.hermes/file_cache/{hash[:2]}/{hash})
  • S3/MinIO presigned URL generation for oversized files
  • Auto-cleanup daemon with configurable TTL
  • Telegram file_id caching for dedup on re-sends

Estimated Effort

Phase Effort Priority New Code
Phase 0: Platform gaps ~2h PREREQUISITE ~130 lines
Phase 1: send_file tool ~4h HIGH ~150 lines
Phase 2: User uploads ~4h MEDIUM ~100 lines
Phase 3: Large files ~4h LOW ~150 lines
Total ~14h ~530 lines

Phases 0+1 deliver the core value. Phase 2 completes bidirectional flow. Phase 3 handles edge cases.


Design Doc

Full design document with backend-specific implementation notes: plans/file-transfer.md

Research Sources

Cloud IDEs: Gitpod, Coder, code-server, JetBrains Gateway, VS Code Remote
Notebooks: Jupyter, Colab, Kaggle, IPython FileLink
Containers: Docker Archive API, kubectl cp, Podman
CI/CD: GitHub Actions artifacts, GitLab CI, Jenkins stash/unstash
AI APIs: OpenAI Responses API, Anthropic Files API, Gemini File API, E2B
Novel tools: transfer.sh, croc, magic-wormhole, Kitty protocol, Mutagen, rclone
Protocols: tus (resumable uploads), WebDAV, 9P, ZMODEM, WebContainers
Agent codebases: OpenHands, Composio, Pi-Mono, Codex, Cline, OpenCode

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions