Evaluate and implement code execution sandboxing (Docker/WASM/Firecracker)

## Context

Implement safe code execution capabilities for agents. When agents need to run code (e.g., a developer agent writing and testing code), execution must be sandboxed to prevent security issues.

**Note:** The subprocess sandbox for file system and git tools is handled separately as an M3 issue (see §11.1.2). This issue covers **stronger isolation backends for code execution** — running arbitrary agent-generated code safely.

**Options under evaluation:**
- **Docker containers** — Strong isolation, higher startup overhead
- **WASM sandbox** — Emerging technology, language support varies
- **Firecracker / gVisor** — VM-level isolation, production-grade

**Evaluation criteria:**
- Security isolation strength
- Startup time and resource overhead
- Language support breadth
- Filesystem and network control granularity

## Acceptance Criteria

- [ ] 3+ sandboxing options evaluated with documented pros/cons
- [ ] Chosen solution implemented as a code execution tool
- [ ] Code execution tool uses the sandbox for all agent-triggered code
- [ ] Filesystem isolation (agents cannot access host filesystem)
- [ ] Configurable network access (allow/deny per sandbox)
- [ ] Timeout enforcement (configurable per execution)
- [ ] Resource limits: CPU time, memory usage, disk space
- [ ] Stdout/stderr capture and return to agent
- [ ] Cleanup after execution (no persistent state unless configured)
- [ ] Unit tests for sandbox creation, execution, and resource limits

## Dependencies

- M3 subprocess sandbox issue (basic sandboxing must exist first)
- #24 — Tool execution framework

## Design Spec Reference

- §11.1.2 — Sandbox Backends (Docker, K8s rows)
- §15.3 — tools/sandbox/ directory structure

---

## Design Decisions Finalized

- **D16 — Sandbox Backend:** Docker MVP only via `aiodocker` (async-native, Python 3.14 support). Pre-built image (Python 3.14 + Node.js LTS, <500MB) + user-configurable. Fail with clear error if Docker unavailable (no unsafe fallback). gVisor (`--runtime=runsc`) as config-level hardening. `SandboxBackend` protocol for future backends. Evaluate alternatives post-M7.

**Common pattern:** All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate and implement code execution sandboxing (Docker/WASM/Firecracker) #50

Context

Acceptance Criteria

Dependencies

Design Spec Reference

Design Decisions Finalized

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluate and implement code execution sandboxing (Docker/WASM/Firecracker) #50

Description

Context

Acceptance Criteria

Dependencies

Design Spec Reference

Design Decisions Finalized

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions