-
Notifications
You must be signed in to change notification settings - Fork 0
Evaluate and implement code execution sandboxing (Docker/WASM/Firecracker) #50
Copy link
Copy link
Closed
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:large3+ days of work3+ days of workspec:securityDESIGN_SPEC Section 12 - Security & Approval SystemDESIGN_SPEC Section 12 - Security & Approval Systemspec:toolsDESIGN_SPEC Section 11 - Tool & Capability SystemDESIGN_SPEC Section 11 - Tool & Capability Systemtype:featureNew feature implementationNew feature implementationtype:researchEvaluate options, make tech decisionsEvaluate options, make tech decisionstype:testTest coverage, test infrastructureTest coverage, test infrastructure
Description
Context
Implement safe code execution capabilities for agents. When agents need to run code (e.g., a developer agent writing and testing code), execution must be sandboxed to prevent security issues.
Note: The subprocess sandbox for file system and git tools is handled separately as an M3 issue (see §11.1.2). This issue covers stronger isolation backends for code execution — running arbitrary agent-generated code safely.
Options under evaluation:
- Docker containers — Strong isolation, higher startup overhead
- WASM sandbox — Emerging technology, language support varies
- Firecracker / gVisor — VM-level isolation, production-grade
Evaluation criteria:
- Security isolation strength
- Startup time and resource overhead
- Language support breadth
- Filesystem and network control granularity
Acceptance Criteria
- 3+ sandboxing options evaluated with documented pros/cons
- Chosen solution implemented as a code execution tool
- Code execution tool uses the sandbox for all agent-triggered code
- Filesystem isolation (agents cannot access host filesystem)
- Configurable network access (allow/deny per sandbox)
- Timeout enforcement (configurable per execution)
- Resource limits: CPU time, memory usage, disk space
- Stdout/stderr capture and return to agent
- Cleanup after execution (no persistent state unless configured)
- Unit tests for sandbox creation, execution, and resource limits
Dependencies
- M3 subprocess sandbox issue (basic sandboxing must exist first)
- End-to-end integration test: single agent receives and completes a task #24 — Tool execution framework
Design Spec Reference
- §11.1.2 — Sandbox Backends (Docker, K8s rows)
- §15.3 — tools/sandbox/ directory structure
Design Decisions Finalized
- D16 — Sandbox Backend: Docker MVP only via
aiodocker(async-native, Python 3.14 support). Pre-built image (Python 3.14 + Node.js LTS, <500MB) + user-configurable. Fail with clear error if Docker unavailable (no unsafe fallback). gVisor (--runtime=runsc) as config-level hardening.SandboxBackendprotocol for future backends. Evaluate alternatives post-M7.
Common pattern: All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:large3+ days of work3+ days of workspec:securityDESIGN_SPEC Section 12 - Security & Approval SystemDESIGN_SPEC Section 12 - Security & Approval Systemspec:toolsDESIGN_SPEC Section 11 - Tool & Capability SystemDESIGN_SPEC Section 11 - Tool & Capability Systemtype:featureNew feature implementationNew feature implementationtype:researchEvaluate options, make tech decisionsEvaluate options, make tech decisionstype:testTest coverage, test infrastructureTest coverage, test infrastructure