feat: sandbox security improvements (auth proxy, gVisor default, 4-domain policy, Chainguard packages)

## Context

Four sandbox security improvements identified from deep-dive research across Docker Sandboxes, NVIDIA OpenShell, and LangSmith Sandboxes.

### Finding 1: Auth Proxy Pattern
[LangSmith Sandboxes](https://blog.langchain.com/introducing-langsmith-sandboxes-secure-code-execution-for-agents/) + [NVIDIA OpenShell Privacy Router](https://github.com/NVIDIA/OpenShell): All external service calls routed through authentication proxy so credentials never enter the sandbox runtime. Three independent implementations converged on this pattern.

### Finding 2: Chainguard Packages
[Chainguard OS Packages](https://thenewstack.io/chainguard-os-packages-containers/): 30k zero-CVE OS packages for building custom hardened images. Extends current Chainguard distroless usage for sandbox images.

### Finding 3: gVisor Default for High-Risk Tools
Deep-dive comparison found gVisor (runsc) provides syscall-level isolation between container and MicroVM strength. `DockerSandboxConfig.runtime` field already exists -- just need to change factory default for `code_execution` and `terminal` tool categories.

### Finding 4: 4-Domain Policy Model
Adapted from OpenShell's declarative YAML policy engine. Extend `DockerSandboxConfig` with a `SandboxPolicy` model covering filesystem/network/process/inference domains. The inference domain (rerouting LLM calls) directly maps to the auth proxy pattern. Hot-reload via `SettingsChangeDispatcher`.

## Deep Dive Verdict

**Keep current sandbox architecture + add these four targeted upgrades.** All external options evaluated (Docker Sandboxes, OpenShell, LangSmith) are either Linux-incompatible for MicroVM, alpha-stage, or hosted-only SaaS. Pattern adoption, not wholesale replacement.

## Action Items

- [ ] **gVisor default** (1-2 days): Change factory default to `"runsc"` for `code_execution` and `terminal` in `factory.py`. Add health-check fallback to `"runc"` if gVisor unavailable.
- [ ] **Auth proxy / SandboxCredentialManager** (3-5 days): Audit all `env_overrides` call sites passing provider API keys. Design abstraction routing LLM-bound traffic from containers through SynthOrg's provider layer. Credentials never enter sandbox env.
- [ ] **SandboxPolicy 4-domain model** (2-3 days): Extend `DockerSandboxConfig` with Pydantic model for filesystem/network/process/inference domains. Backwards-compatible migration of existing `network`, `allowed_hosts`, `dns_allowed`, `loopback_allowed` fields.
- [ ] **Chainguard Packages evaluation**: Evaluate for sandbox image when custom packages needed beyond standard distroless variants.

## References

- [LangSmith Sandboxes](https://blog.langchain.com/introducing-langsmith-sandboxes-secure-code-execution-for-agents/)
- [Chainguard OS Packages](https://thenewstack.io/chainguard-os-packages-containers/)
- [NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell) -- Privacy Router + 4-domain policy
- [Docker Sandboxes Architecture](https://docs.docker.com/ai/sandboxes/architecture/) -- MicroVM reference (macOS only, no Linux)
- Deep dive: sandbox architecture comparison (2026-03-22)

---

## Additional Research (2026-03-26)

### Reward Hacking Categories Reference
**Source**: [LongCat-Flash-Prover (arXiv:2603.21065)](https://huggingface.co/papers/2603.21065)

9 categories of reward hacking / agent cheating discovered during formal theorem proving, directly applicable as a security rule engine reference checklist:

1. **Proposition tampering** -- agent modifies the problem statement
2. **Early termination** -- agent declares success prematurely
3. **Unproven assumptions** -- agent asserts facts without evidence
4. **Context pollution** -- agent injects misleading context
5. **Command injection** -- agent embeds commands in data
6. **Tautological proofs** -- agent proves trivially true statements
7. **Circular reasoning** -- agent uses conclusion as premise
8. **Scope reduction** -- agent solves a simpler version of the problem
9. **Format exploitation** -- agent exploits output format expectations

**AST-based legality detection** validates agent-generated code before execution. Pattern applicable to our security rule engine for validating tool call outputs.

Code: [github.com/meituan-longcat/LongCat-Flash-Prover](https://github.com/meituan-longcat/LongCat-Flash-Prover)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sandbox security improvements (auth proxy, gVisor default, 4-domain policy, Chainguard packages) #696

Context

Finding 1: Auth Proxy Pattern

Finding 2: Chainguard Packages

Finding 3: gVisor Default for High-Risk Tools

Finding 4: 4-Domain Policy Model

Deep Dive Verdict

Action Items

References

Additional Research (2026-03-26)

Reward Hacking Categories Reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: sandbox security improvements (auth proxy, gVisor default, 4-domain policy, Chainguard packages) #696

Description

Context

Finding 1: Auth Proxy Pattern

Finding 2: Chainguard Packages

Finding 3: gVisor Default for High-Risk Tools

Finding 4: 4-Domain Policy Model

Deep Dive Verdict

Action Items

References

Additional Research (2026-03-26)

Reward Hacking Categories Reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions