Skip to content

Evaluate and implement code execution sandboxing (Docker/WASM/Firecracker) #50

@Aureliolo

Description

@Aureliolo

Context

Implement safe code execution capabilities for agents. When agents need to run code (e.g., a developer agent writing and testing code), execution must be sandboxed to prevent security issues.

Note: The subprocess sandbox for file system and git tools is handled separately as an M3 issue (see §11.1.2). This issue covers stronger isolation backends for code execution — running arbitrary agent-generated code safely.

Options under evaluation:

  • Docker containers — Strong isolation, higher startup overhead
  • WASM sandbox — Emerging technology, language support varies
  • Firecracker / gVisor — VM-level isolation, production-grade

Evaluation criteria:

  • Security isolation strength
  • Startup time and resource overhead
  • Language support breadth
  • Filesystem and network control granularity

Acceptance Criteria

  • 3+ sandboxing options evaluated with documented pros/cons
  • Chosen solution implemented as a code execution tool
  • Code execution tool uses the sandbox for all agent-triggered code
  • Filesystem isolation (agents cannot access host filesystem)
  • Configurable network access (allow/deny per sandbox)
  • Timeout enforcement (configurable per execution)
  • Resource limits: CPU time, memory usage, disk space
  • Stdout/stderr capture and return to agent
  • Cleanup after execution (no persistent state unless configured)
  • Unit tests for sandbox creation, execution, and resource limits

Dependencies

Design Spec Reference

  • §11.1.2 — Sandbox Backends (Docker, K8s rows)
  • §15.3 — tools/sandbox/ directory structure

Design Decisions Finalized

  • D16 — Sandbox Backend: Docker MVP only via aiodocker (async-native, Python 3.14 support). Pre-built image (Python 3.14 + Node.js LTS, <500MB) + user-configurable. Fail with clear error if Docker unavailable (no unsafe fallback). gVisor (--runtime=runsc) as config-level hardening. SandboxBackend protocol for future backends. Evaluate alternatives post-M7.

Common pattern: All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:large3+ days of workspec:securityDESIGN_SPEC Section 12 - Security & Approval Systemspec:toolsDESIGN_SPEC Section 11 - Tool & Capability Systemtype:featureNew feature implementationtype:researchEvaluate options, make tech decisionstype:testTest coverage, test infrastructure

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions