Skip to content

feat: sandbox security improvements (auth proxy, gVisor default, 4-domain policy, Chainguard packages) #696

@Aureliolo

Description

@Aureliolo

Context

Four sandbox security improvements identified from deep-dive research across Docker Sandboxes, NVIDIA OpenShell, and LangSmith Sandboxes.

Finding 1: Auth Proxy Pattern

LangSmith Sandboxes + NVIDIA OpenShell Privacy Router: All external service calls routed through authentication proxy so credentials never enter the sandbox runtime. Three independent implementations converged on this pattern.

Finding 2: Chainguard Packages

Chainguard OS Packages: 30k zero-CVE OS packages for building custom hardened images. Extends current Chainguard distroless usage for sandbox images.

Finding 3: gVisor Default for High-Risk Tools

Deep-dive comparison found gVisor (runsc) provides syscall-level isolation between container and MicroVM strength. DockerSandboxConfig.runtime field already exists -- just need to change factory default for code_execution and terminal tool categories.

Finding 4: 4-Domain Policy Model

Adapted from OpenShell's declarative YAML policy engine. Extend DockerSandboxConfig with a SandboxPolicy model covering filesystem/network/process/inference domains. The inference domain (rerouting LLM calls) directly maps to the auth proxy pattern. Hot-reload via SettingsChangeDispatcher.

Deep Dive Verdict

Keep current sandbox architecture + add these four targeted upgrades. All external options evaluated (Docker Sandboxes, OpenShell, LangSmith) are either Linux-incompatible for MicroVM, alpha-stage, or hosted-only SaaS. Pattern adoption, not wholesale replacement.

Action Items

  • gVisor default (1-2 days): Change factory default to "runsc" for code_execution and terminal in factory.py. Add health-check fallback to "runc" if gVisor unavailable.
  • Auth proxy / SandboxCredentialManager (3-5 days): Audit all env_overrides call sites passing provider API keys. Design abstraction routing LLM-bound traffic from containers through SynthOrg's provider layer. Credentials never enter sandbox env.
  • SandboxPolicy 4-domain model (2-3 days): Extend DockerSandboxConfig with Pydantic model for filesystem/network/process/inference domains. Backwards-compatible migration of existing network, allowed_hosts, dns_allowed, loopback_allowed fields.
  • Chainguard Packages evaluation: Evaluate for sandbox image when custom packages needed beyond standard distroless variants.

References


Additional Research (2026-03-26)

Reward Hacking Categories Reference

Source: LongCat-Flash-Prover (arXiv:2603.21065)

9 categories of reward hacking / agent cheating discovered during formal theorem proving, directly applicable as a security rule engine reference checklist:

  1. Proposition tampering -- agent modifies the problem statement
  2. Early termination -- agent declares success prematurely
  3. Unproven assumptions -- agent asserts facts without evidence
  4. Context pollution -- agent injects misleading context
  5. Command injection -- agent embeds commands in data
  6. Tautological proofs -- agent proves trivially true statements
  7. Circular reasoning -- agent uses conclusion as premise
  8. Scope reduction -- agent solves a simpler version of the problem
  9. Format exploitation -- agent exploits output format expectations

AST-based legality detection validates agent-generated code before execution. Pattern applicable to our security rule engine for validating tool call outputs.

Code: github.com/meituan-longcat/LongCat-Flash-Prover

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:medium1-3 days of workspec:securityDESIGN_SPEC Section 12 - Security & Approval Systemspec:toolsDESIGN_SPEC Section 11 - Tool & Capability Systemtype:featureNew feature implementationv0.7Minor version v0.7v0.7.0Patch release v0.7.0

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions