Skip to content

fix(sandbox): explicitly pass sandboxSessionKey to prevent Docker probes when mode is off (#73586)#73627

Closed
PratikRai0101 wants to merge 13 commits intoopenclaw:mainfrom
PratikRai0101:fix/sandbox-off-docker-probe
Closed

fix(sandbox): explicitly pass sandboxSessionKey to prevent Docker probes when mode is off (#73586)#73627
PratikRai0101 wants to merge 13 commits intoopenclaw:mainfrom
PratikRai0101:fix/sandbox-off-docker-probe

Conversation

@PratikRai0101
Copy link
Copy Markdown
Contributor

Summary

  • Problem: When agents.defaults.sandbox.mode is set to "off", isolated sessions (cron jobs, heartbeats) still trigger Docker capability probes (socket connection attempts and image inspections).
  • Why it matters: This causes startup failures and significant log noise for users without a running Docker daemon, effectively creating a hard dependency on Docker even when the feature is explicitly disabled.
  • What changed: 1. Updated src/cron/isolated-agent/run-executor.ts to explicitly pass sandboxSessionKey during runEmbeddedPiAgent calls.
    2. Added an explicit sandboxed status check at the top of resolveSandboxContext in src/agents/sandbox/context.ts to ensure no backend initialization occurs if sandboxing is disabled.
  • What did NOT change: The core logic for determining when a session should be sandboxed remains the same; we are simply ensuring the configuration is respected during the preflight phase of isolated runs.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: In isolated cron/heartbeat runs, the sandboxSessionKey was not explicitly passed, causing the resolver to fall back to the runSessionId (a UUID). Because this UUID didn't match the mainSessionKey, the logic incorrectly flagged the session as "non-main" and enabled sandboxing despite the global "off" setting.
  • Missing detection / guardrail: resolveSandboxContext lacked a high-level short-circuit to prevent backend factory resolution before validating the sandboxed status.
  • Contributing context: This was specifically visible on macOS environments where Docker is often managed via on-demand services (OrbStack/Docker Desktop).

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
  • Target test or file: src/cron/isolated-agent/docker-probe.regression.test.ts
  • Scenario the test should lock in: Mocking a failed Docker socket connection and ensuring resolveSandboxContext returns null without throwing when mode is "off".
  • Why this is the smallest reliable guardrail: It directly targets the interaction between session key resolution and backend initialization.

User-visible / Behavior Changes

  • Users with sandbox.mode: "off" will no longer see "failed to connect to docker API" errors in their gateway logs.
  • Isolated tasks will no longer fail or experience latency spikes due to redundant Docker probes on startup.

Diagram (if applicable)

Before:
[Cron Start] -> [UUID Key] -> [shouldSandbox check] -> [Result: True (Key mismatch)] -> [Docker Probe] -> [CRASH]

After:
[Cron Start] -> [Agent Key] -> [shouldSandbox check] -> [Result: False (Mode: off)] -> [Early Exit] -> [Success]

## Security Impact (required)

- New permissions/capabilities? (`No`)
- Secrets/tokens handling changed? (`No`)
- New/changed network calls? (`No` - Actually removes unnecessary local socket calls)
- Command/tool execution surface changed? (`No`)
- Data access scope changed? (`No`)

## Repro + Verification

### Environment

- OS: macOS 26.3 (arm64) / Arch Linux
- Runtime/container: Node.js (host)
- Model/provider: Anthropic / OpenAI
- Relevant config: `agents.defaults.sandbox.mode: "off"`

### Steps

1. Set sandbox mode to "off" in `openclaw.json`.
2. Stop the local Docker daemon.
3. Trigger a cron-based isolated session.

### Expected
- Session executes on host immediately without checking Docker.

### Actual
- Session failed with `dial unix /var/run/docker.sock: connect: no such file or directory`.

## Evidence

- [x] Failing test/log before + passing after: `docker-probe.regression.test.ts` confirmed the leak was plugged.
- [x] Trace/log snippets: Verified `[gateway] ready` state is reached without socket errors.

## Human Verification (required)

- Verified scenarios: Local cron job execution with Docker service stopped.
- Edge cases checked: Verified sub-agent spawns also respect the explicit `sandboxSessionKey` handoff.
- What you did **not** verify: "all" mode (remains unchanged).

## Review Conversations

- [x] I replied to or resolved every bot review conversation I addressed in this PR.
- [x] I left unresolved only the conversations that still need reviewer or maintainer judgment.

## Compatibility / Migration

- Backward compatible? (`Yes`)
- Config/env changes? (`No`)
- Migration needed? (`No`)

## Risks and Mitigations

- Risk: Restructuring `resolveSandboxContext` might affect "non-main" auto-sandboxing.
- Mitigation: Added `sandbox.resolveSandboxContext.test.ts` to verify that standard sandboxing still triggers correctly when intended.

@PratikRai0101 PratikRai0101 requested a review from a team as a code owner April 28, 2026 15:14
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Apr 28, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR fixes a regression where cron/heartbeat isolated sessions triggered Docker capability probes even when sandbox.mode was explicitly set to "off". The core fix in run-executor.ts passes agentSessionKey as sandboxSessionKey to runEmbeddedPiAgent, preventing the session-key mismatch that caused resolveSandboxRuntimeStatus to incorrectly flag the run as needing sandbox. The context.ts change swaps getRuntimeConfig() for loadConfig(), which is a no-op since one is simply an alias for the other.

Confidence Score: 5/5

Safe to merge — the fix is narrow, well-tested, and the only behavioral change is passing the correct session key to avoid spurious Docker probes.

No P0 or P1 issues found. The getRuntimeConfig → loadConfig swap in context.ts is functionally identical (one calls the other). The sandboxSessionKey forwarding in run-executor.ts is the targeted fix, and the regression test directly exercises the failure mode.

No files require special attention. The existing P2 note about runCliAgent not receiving sandboxSessionKey was flagged in a prior review cycle.

Reviews (2): Last reviewed commit: "Merge branch 'main' into fix/sandbox-off..." | Re-trigger Greptile

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: keeping this open for maintainer follow-up; there is still a little grit to resolve.

Keep this PR open. Current main still lacks the PR's central cron sandboxSessionKey handoff, the linked bug #73586 remains open, and this PR is still an active implementation candidate rather than cleanup-closeable.

Best possible solution:

Keep this PR open as the active implementation candidate for #73586. The maintainer path is to review the narrow sandboxSessionKey: params.agentSessionKey handoff, require regression coverage that proves Docker backend initialization is not reached for agents.defaults.sandbox.mode: "off", and verify off, non-main, and all behavior for isolated cron plus the heartbeat/sub-agent surfaces named in the linked bug.

What I checked:

  • Current main lacks the proposed cron sandbox policy handoff: createCronPromptExecutor calls runEmbeddedPiAgent with sessionKey: params.runSessionKey, but there is no sandboxSessionKey: params.agentSessionKey in the current main call. (src/cron/isolated-agent/run-executor.ts:166, 6d323ee73640)
  • Cron maintains separate canonical and run-scoped session keys: Cron derives a stable agentSessionKey, then uses ${agentSessionKey}:run:${runSessionId} for isolated cron: jobs. That matches the PR's stated goal of keeping transcript/run isolation while resolving sandbox policy from the canonical agent key. (src/cron/isolated-agent/run.ts:504, 6d323ee73640)
  • Embedded runner has an explicit sandbox policy key seam: runEmbeddedAttempt resolves sandbox policy from params.sandboxSessionKey?.trim() || params.sessionKey?.trim() || params.sessionId, so missing the cron handoff means the run-scoped key is used for sandbox resolution. (src/agents/pi-embedded-runner/run/attempt.ts:592, 6d323ee73640)
  • Existing main tests do not cover the new handoff: The current isolated cron identity test asserts that the embedded run receives the run-scoped sessionKey, but it does not assert sandboxSessionKey; the proposed docker-probe.regression.test.ts is not present on main. (src/cron/isolated-agent/run.session-key-isolation.test.ts:47, 6d323ee73640)
  • Sandbox off contract is real, but root cause should still be verified: Docs say sandboxing off means host execution, and current code returns false for cfg.mode === "off"; resolveSandboxContext returns before backend factory setup when runtime is not sandboxed. Review should confirm whether the reported Docker probe is from key policy, config propagation, or another preflight path before merge. (src/agents/sandbox/runtime-status.ts:16, 6d323ee73640)
  • Security-sensitive diff pass: The provided PR file list is limited to the cron executor, one cron regression test, and a sandbox context config fallback change. It does not touch workflows, dependency sources, lockfiles, install/build/release scripts, package publishing metadata, permissions, or secrets handling. The context.ts change still deserves functional review because current main falls back to getRuntimeConfig(), which is an alias for loadConfig(). (src/config/io.ts:2347, 6d323ee73640)

Remaining risk / open question:

  • The PR's stated root cause is not fully proven by current code alone: mode: "off" already short-circuits in shouldSandboxSession, so maintainers should require a focused failing test or trace proving which preflight path reaches Docker.
  • The src/agents/sandbox/context.ts diff replaces a runtime config fallback with params.config! in the sandbox browser auth path; reviewers should verify every browser-enabled sandbox caller passes config before accepting that cleanup.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6d323ee73640.

@PratikRai0101
Copy link
Copy Markdown
Contributor Author

@greptile-review

@steipete
Copy link
Copy Markdown
Contributor

Thanks @PratikRai0101 for jumping on this. Closing this PR as superseded rather than rejected.

The linked issue #73586 is now fixed on main by PR #73671 plus the ClawSweeper follow-up #74520:

Those landed fixes take a slightly different shape: current main guarantees sandbox.mode: "off" does not initialize sandbox backends, and Docker-daemon-unavailable errors fail before container creation when Docker sandboxing is actually required. The current tests cover the sandbox-off no-backend path and the daemon-unavailable Docker/browser paths, so we do not need to carry this alternate branch forward.

@steipete steipete closed this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

2 participants