The in-sandbox agent powering hf sandbox / huggingface_hub.Sandbox — isolated cloud
machines built on Hugging Face Jobs.
A single static binary (~671KB, x86_64 musl, zero runtime dependencies) that runs in any
Docker image with /bin/sh — no Python, pip, or framework required. The Sandbox client
injects it at job startup and talks to it through the Jobs proxy
(https://<job_id>--<port>.hf.jobs).
The same binary serves both:
- Dedicated mode — one job is one sandbox. Operations hit
/v1/exec,/v1/files/*,/v1/procs/*directly. Full VM isolation; used for GPU / untrusted workloads. - Host mode — one job hosts many lightweight sandboxes (
huggingface_hub.SandboxPool). A sandbox is a dedicated uid + a private0700home + a per-sandbox Landlock LSM ruleset, created server-side in ~1ms. Operations are scoped under/v1/sandboxes/{id}/*and run as the sandbox uid, confined to its home. This packs dozens of isolated CPU sandboxes into one VM with sub-second per-sandbox cold start.
Host mode needs root + CAP_SETUID/SETGID/KILL (the Docker default on HF Jobs) and degrades
to uid-only isolation if Landlock is unavailable. See src/landlock.rs for the confinement
model (FS → own home + RO system dirs; no TCP bind; ABI-6 abstract-socket scoping).
- Command execution with live output streaming (NDJSON over chunked HTTP/1.1, flushed per event — the HTTP layer is hand-rolled because mainstream minimal frameworks buffer chunked responses until completion).
- Background processes: registry with buffered logs (4 MiB ring per process), follow mode, wait, kill (process-group signals), stdin injection.
- File API: raw-body read/write (no base64),
offset/lengthparams for parallel ranged transfers, list/stat/delete/mkdir. - Keepalive pings every 15s on all streams so proxies never kill idle connections.
- Idle watchdog: exits when no request arrives and no process runs for
SBX_IDLE_TIMEOUTseconds — abandoned sandboxes stop billing.
GET /health → {"status","version","uptime_ms"} (no auth)
POST /v1/exec {cmd, shell?, env?, cwd?, timeout?, stdin?, background?, tag?}
foreground → NDJSON stream: start / stdout / stderr / ping / exit
background → {"pid", "tag"}
GET /v1/procs → process list
GET /v1/procs/{pid}/logs?follow= → NDJSON replay (+live)
GET /v1/procs/{pid}/wait → NDJSON pings until exit event
POST /v1/procs/{pid}/kill {signal?} → default SIGKILL, to the process group
POST /v1/procs/{pid}/stdin?eof= → raw body to stdin
GET /v1/files/read?path=&offset=&length= → raw bytes
PUT /v1/files/write?path=&mode=&offset= → raw body to file (parents created)
GET /v1/files/list?path= /stat?path=
DELETE /v1/files/delete?path=&recursive=
POST /v1/files/mkdir?path=
# host mode (many sandboxes per job)
POST /v1/sandboxes {count?, env?, max_procs?, max_mem_mb?} → {"sandboxes":[{id,uid,home}]}
GET /v1/sandboxes → live sandbox list
DELETE /v1/sandboxes → delete all
DELETE /v1/sandboxes/{id} → delete one (frees the uid)
# every dedicated route above also exists scoped to a sandbox, e.g.:
POST /v1/sandboxes/{id}/exec ... GET /v1/sandboxes/{id}/procs
GET /v1/sandboxes/{id}/files/read ... PUT /v1/sandboxes/{id}/files/write
cmd is either a string (run via /bin/sh -c) or an argv array. Pass shell (bool) to make
that choice explicit instead of inferring it from the type: shell=true requires a string,
shell=false requires an argv array. In host mode, file paths
are rooted at the sandbox's private home (a leading / is taken relative to it) and created
files are chowned to the sandbox uid.
| var | default | meaning |
|---|---|---|
SBX_PORT |
8000 |
listen port (the client uses 49983 to keep common dev ports free) |
SBX_TOKEN |
unset | if set, all endpoints except /health require the X-Sandbox-Token header (constant-time compare); removed from the env before any child process spawns |
SBX_IDLE_TIMEOUT |
unset | seconds of inactivity (no authed request, no running process) before clean exit |
Two layers when running on HF Jobs:
- The Jobs proxy requires an HF token with read access to the job's namespace.
SBX_TOKENis delivered via encrypted job secrets; the client derives it asHMAC-SHA256(user_hf_token, nonce)with the nonce stored in job labels — so reconnection is stateless and the HF token itself never enters the sandbox.
rustup target add x86_64-unknown-linux-musl
cargo build --release --target x86_64-unknown-linux-musl
# → target/x86_64-unknown-linux-musl/release/sbx-server (static-pie, stripped)The binary is distributed via a Hugging Face model repo and downloaded at job startup by a
/bin/sh bootstrap (wget → curl → python3 fallback chain).
Working prototype. See the huggingface_hub draft PR for the client, CLI, design notes and
benchmarks (cold start ~6s, exec ~110ms p50, 340+ MiB/s parallel file transfer).