Skip to content

fix(sandbox): gracefully handle Docker daemon unavailability when sandbox mode is off#73671

Merged
sallyom merged 9 commits intoopenclaw:mainfrom
kaseonedge:fix/73586-sandbox-mode-off-docker-probe
Apr 29, 2026
Merged

fix(sandbox): gracefully handle Docker daemon unavailability when sandbox mode is off#73671
sallyom merged 9 commits intoopenclaw:mainfrom
kaseonedge:fix/73586-sandbox-mode-off-docker-probe

Conversation

@kaseonedge
Copy link
Copy Markdown
Contributor

Bug #73586

When sandbox.mode: "off" is configured, session startup still triggers a Docker capability probe (docker image inspect) that fails when the Docker daemon is unavailable:

Error: Failed to inspect sandbox image: failed to connect to the docker API
at unix:///Users/macmini/.orbstack/run/docker.sock: dial unix
/Users/macmini/.orbstack/run/docker.sock: connect: no such file or directory

Root Cause

dockerImageExists() in src/agents/sandbox/docker.ts threw when Docker daemon was unavailable, even though resolveSandboxContext correctly returns null for sandbox.mode="off". The probe error bubbled up during session startup.

Fix

Added isDockerDaemonUnavailable() helper to detect Docker connectivity errors (missing socket, connection refused, etc.) and return false gracefully instead of throwing. Applied to:

  • dockerImageExists() — returns false when daemon is down
  • ensureDockerImage() — skips pull when daemon is unavailable
  • ensureSandboxBrowserImage() in browser.ts — same graceful handling
  • dockerImageExists() in doctor-sandbox.ts — same graceful handling

Files Changed

  • src/agents/sandbox/docker.ts (+17 lines)
  • src/agents/sandbox/browser.ts (+17 lines)
  • src/commands/doctor-sandbox.ts (+10 lines)

Testing

  • Fix is defensive: catches Docker daemon error patterns and returns false
  • Does not change behavior when Docker is available
  • No behavior change for sandbox.mode="all" or sandbox.mode="non-main"
  • Committed and pushed from fork for CI validation

@kaseonedge kaseonedge requested a review from a team as a code owner April 28, 2026 16:27
@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations agents Agent runtime and tooling size: S labels Apr 28, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR adds graceful handling for Docker daemon unavailability across docker.ts, browser.ts, and doctor-sandbox.ts so that sandbox.mode="off" sessions no longer crash on startup when Docker isn't running.

  • P1 – incomplete fix for custom images: In ensureDockerImage, when a user configures a non-default sandbox image, dockerImageExists silently returns false on daemon unavailability, then the function falls through to throw new Error(\Sandbox image not found: ${image}`). The daemon guard only covers the DEFAULT_SANDBOX_IMAGE` pull branch, leaving custom-image users with the same startup crash.
  • P2 – duplicated detection logic: isDockerDaemonUnavailable is defined twice as a named function (with different names in docker.ts vs browser.ts) and inlined a third time in doctor-sandbox.ts; a shared utility would keep drift from creeping in.

Confidence Score: 3/5

Not safe to merge as-is — the fix is incomplete and custom-image users will still see the crash it aims to prevent.

One P1 defect: the non-default image path in ensureDockerImage still throws when the Docker daemon is down, meaning the bug is only partially fixed. Two P2 issues (duplicated helper, broad heuristic string) are non-blocking but worth addressing.

src/agents/sandbox/docker.ts — the ensureDockerImage fallthrough for non-default images needs a daemon-unavailability guard matching the one added for the default image.

Comments Outside Diff (1)

  1. src/agents/sandbox/docker.ts, line 326-327 (link)

    P1 Custom images still throw when daemon is unavailable

    When image is a user-configured (non-default) image, dockerImageExists returns false due to daemon unavailability, but then the function falls through to throw new Error(\Sandbox image not found: ${image}. Build or pull it first.`). The daemon check only covers the DEFAULT_SANDBOX_IMAGEpull path, so any session that configures a custom sandbox image still surfaces the original crash when Docker is down andsandbox.mode="off"`.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/agents/sandbox/docker.ts
    Line: 326-327
    
    Comment:
    **Custom images still throw when daemon is unavailable**
    
    When `image` is a user-configured (non-default) image, `dockerImageExists` returns `false` due to daemon unavailability, but then the function falls through to `throw new Error(\`Sandbox image not found: ${image}. Build or pull it first.\`)`. The daemon check only covers the `DEFAULT_SANDBOX_IMAGE` pull path, so any session that configures a custom sandbox image still surfaces the original crash when Docker is down and `sandbox.mode="off"`.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/sandbox/docker.ts
Line: 326-327

Comment:
**Custom images still throw when daemon is unavailable**

When `image` is a user-configured (non-default) image, `dockerImageExists` returns `false` due to daemon unavailability, but then the function falls through to `throw new Error(\`Sandbox image not found: ${image}. Build or pull it first.\`)`. The daemon check only covers the `DEFAULT_SANDBOX_IMAGE` pull path, so any session that configures a custom sandbox image still surfaces the original crash when Docker is down and `sandbox.mode="off"`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/sandbox/browser.ts
Line: 126-135

Comment:
**Duplicated helper with a different name**

`isDockerDaemonUnavailableBrowser` is byte-for-byte identical to `isDockerDaemonUnavailable` in `docker.ts`, and the same logic is inlined a third time in `doctor-sandbox.ts`. Duplicating this across three files means any future change to the detection heuristics (e.g. adding a new Docker-Desktop-specific message) must be applied in all three places. Consider extracting to a shared utility, e.g. `src/agents/sandbox/docker-utils.ts`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/sandbox/docker.ts
Line: 279

Comment:
**Overly-broad heuristic string**

`"no such file or directory"` is a generic POSIX error string. In the context of `image inspect` it almost always means a missing socket, but on some systems it can appear in unrelated Docker error output (e.g. a bind-mount whose host path doesn't exist). Pairing it with the already-present `"dial unix"` check covers the missing-socket case; the standalone string can silently swallow real Docker errors. The `"dial unix"` and `"cannot connect to the docker daemon"` checks are sufficient for the stated goal.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(sandbox): gracefully handle Docker d..." | Re-trigger Greptile

Comment thread src/agents/sandbox/browser.ts Outdated
Comment thread src/agents/sandbox/docker.ts Outdated
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: needs maintainer review before merge.

What this changes:

The PR exports a shared Docker-daemon-unavailable classifier, applies it to sandbox, browser, and doctor Docker image-inspect paths, and adds a default-image inspection regression test.

Maintainer follow-up before merge:

This is an active open implementation PR assigned to a maintainer, touches secops-owned sandbox failure semantics, and has an alternate open fix candidate; the remaining action is maintainer selection, review, and validation rather than an automated replacement PR.

Best possible solution:

Keep the PR open for maintainer and secops review, choose the intended fix shape for #73586, and land only after mode-off sessions are proven not to initialize Docker while sandbox-enabled sessions still surface Docker availability failures clearly.

Acceptance criteria:

  • pnpm test src/agents/sandbox/docker.test.ts src/agents/sandbox.resolveSandboxContext.test.ts src/commands/doctor-sandbox.warns-sandbox-enabled-without-docker.test.ts
  • Add or run focused regression coverage for browser sandbox image setup and the cron/heartbeat/sub-agent mode-off reproduction before merge.
  • Run pnpm check:changed in Testbox before handoff if the PR is updated for merge.

What I checked:

Likely related people:

  • sallyom: Live PR metadata shows sallyom is assigned, and the latest PR head was force-pushed by sallyom in the provided timeline, making them the current review and follow-up contact for this implementation candidate. (role: recent PR follow-up owner; confidence: medium; commits: 378851cf4053, d64463937045, 050b2f9a9fae; files: src/agents/sandbox/docker.ts, src/agents/sandbox/browser.ts, src/commands/doctor-sandbox.ts)
  • Masato Hoshino: Current checkout blame attributes the visible Docker image-probe, browser image-probe, and doctor image-probe lines to commit 016f5ae; this is routing context, not fault attribution. (role: recent current-main maintainer; confidence: medium; commits: 016f5ae862e1; files: src/agents/sandbox/docker.ts, src/agents/sandbox/browser.ts, src/commands/doctor-sandbox.ts)
  • steipete: The v2026.4.26 release commit contains the central sandbox Docker, browser, context, and doctor files that match the version named by the linked bug, so steipete is relevant for shipped-behavior provenance. (role: release snapshot maintainer; confidence: medium; commits: be8c24633aaa; files: src/agents/sandbox/docker.ts, src/agents/sandbox/browser.ts, src/agents/sandbox/context.ts)

Remaining risk / open question:

  • The PR treats Docker-daemon-unavailable image inspection as success in ensureDockerImage, so maintainers should verify required sandbox runs still fail clearly when Docker is actually needed.
  • Regression coverage is narrow: the PR adds one default-image inspection test but does not cover the cron, heartbeat, sub-agent mode-off path, browser image setup, doctor behavior, or custom images.
  • There are two open fix shapes for the same linked bug, fix(sandbox): explicitly pass sandboxSessionKey to prevent Docker probes when mode is off (#73586) #73627 and this PR; maintainers should choose deliberately between the root-cause sandbox policy-key handoff and the broader Docker probe fallback.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0544c6d493ff.

@kaseonedge kaseonedge force-pushed the fix/73586-sandbox-mode-off-docker-probe branch from 9da403c to 25c740a Compare April 28, 2026 18:39
kaseonedge pushed a commit to kaseonedge/openclaw that referenced this pull request Apr 28, 2026
…port

- Export from docker.ts, import in browser.ts and doctor-sandbox.ts
- Eliminates byte-for-byte duplicated logic across three files
- Addresses codex review feedback on PR openclaw#73671
@sallyom sallyom self-assigned this Apr 29, 2026
@sallyom sallyom force-pushed the fix/73586-sandbox-mode-off-docker-probe branch from 20ea194 to 644cf0b Compare April 29, 2026 05:08
@openclaw-barnacle openclaw-barnacle Bot added docker Docker and sandbox tooling size: S and removed size: XS labels Apr 29, 2026
@sallyom sallyom force-pushed the fix/73586-sandbox-mode-off-docker-probe branch 2 times, most recently from d644639 to 050b2f9 Compare April 29, 2026 05:41
metal and others added 6 commits April 29, 2026 09:25
ensureDockerImage now tries to pull custom images and gracefully returns
when the Docker daemon is down, instead of throwing 'Sandbox image not
found'. This closes the gap identified in the PR review where only
DEFAULT_SANDBOX_IMAGE was guarded.
…port

- Export from docker.ts, import in browser.ts and doctor-sandbox.ts
- Eliminates byte-for-byte duplicated logic across three files
- Addresses codex review feedback on PR openclaw#73671
Introduced during isDockerDaemonUnavailable consolidation refactor.
Fixes CI build failures from duplicate identifier declarations.
The duplicate import of SandboxConfig from types.js was causing
a PARSE_ERROR (identifier already declared) that broke CI.

Co-authored-by: Coder <coder@openclaw.ai>
- Remove silent pull-and-tag for DEFAULT_SANDBOX_IMAGE in ensureDockerImage
  The original error message correctly states OpenClaw won't substitute
  plain debian:bookworm-slim; silent substitution undermines this intent.
  Docker daemon unavailability is still handled gracefully in
  dockerImageExists().

- Consolidate mid-file type imports in browser.ts
  Move SandboxBrowserConfig to top-level import block alongside other
  types, eliminating the orphan import statement.
metal and others added 3 commits April 29, 2026 09:25
…socket paths

Greptile review feedback: the bare string is a generic POSIX error.
Now requires it to appear alongside '/var/run/docker' or 'docker.sock'
to avoid false positives.
… isDockerDaemonUnavailable

Greptile review: 'dial unix' already covers missing Docker socket case
(e.g. 'dial unix /var/run/docker.sock: connect: no such file or directory').
The bare string is also too generic and could cause false positives.
@sallyom sallyom force-pushed the fix/73586-sandbox-mode-off-docker-probe branch from 050b2f9 to 378851c Compare April 29, 2026 14:07
@sallyom sallyom merged commit 2dadc82 into openclaw:main Apr 29, 2026
75 checks passed
@sallyom
Copy link
Copy Markdown
Contributor

sallyom commented Apr 29, 2026

Merged in 2dadc82.

Prepared head: 378851c

vincentkoc added a commit that referenced this pull request Apr 29, 2026
Adds six missing entries for commits that landed without their own
CHANGELOG.md update, picked from the last six hours of origin/main and
attributed to the original contributors.

Changes:
- Control UI/i18n locale registry expansion + new docs glossaries
  (297f4c6, 0126692 by @vincentkoc).
- Gateway/diagnostics opt-in startup timeline (097eed8, d001c34,
  e69da9d by @shakkernerd).

Fixes:
- Matrix `verify confirm-sas` cross-signing close (86956f7 by
  @nklock; #74542).
- `openclaw status` channel context-window overrides (eb7d89f by
  @HemantSudarshan).
- Sandbox Docker daemon graceful when sandbox mode is off (2dadc82
  by @kaseonedge; #73671).
- Control UI mobile chat settings persisted via Lit state (b1c5152
  by @BunsDev).

Skipped Peter-only commits with no external collaborator (per the
maintainer-attribution rule against thanking @steipete) and the model
list auth-index series (already covered by the existing "Models/UI:
hide unauthenticated providers" entry).
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
…dbox mode is off (openclaw#73671)

Merged via squash.

Prepared head SHA: 378851c
Co-authored-by: kaseonedge <15183881+kaseonedge@users.noreply.github.com>
Co-authored-by: sallyom <11166065+sallyom@users.noreply.github.com>
Reviewed-by: @sallyom
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
Adds six missing entries for commits that landed without their own
CHANGELOG.md update, picked from the last six hours of origin/main and
attributed to the original contributors.

Changes:
- Control UI/i18n locale registry expansion + new docs glossaries
  (297f4c6, 0126692 by @vincentkoc).
- Gateway/diagnostics opt-in startup timeline (097eed8, d001c34,
  e69da9d by @shakkernerd).

Fixes:
- Matrix `verify confirm-sas` cross-signing close (86956f7 by
  @nklock; openclaw#74542).
- `openclaw status` channel context-window overrides (eb7d89f by
  @HemantSudarshan).
- Sandbox Docker daemon graceful when sandbox mode is off (2dadc82
  by @kaseonedge; openclaw#73671).
- Control UI mobile chat settings persisted via Lit state (b1c5152
  by @BunsDev).

Skipped Peter-only commits with no external collaborator (per the
maintainer-attribution rule against thanking @steipete) and the model
list auth-index series (already covered by the existing "Models/UI:
hide unauthenticated providers" entry).
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…dbox mode is off (openclaw#73671)

Merged via squash.

Prepared head SHA: 378851c
Co-authored-by: kaseonedge <15183881+kaseonedge@users.noreply.github.com>
Co-authored-by: sallyom <11166065+sallyom@users.noreply.github.com>
Reviewed-by: @sallyom
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Adds six missing entries for commits that landed without their own
CHANGELOG.md update, picked from the last six hours of origin/main and
attributed to the original contributors.

Changes:
- Control UI/i18n locale registry expansion + new docs glossaries
  (3562a87, ffe547a by @vincentkoc).
- Gateway/diagnostics opt-in startup timeline (b6df431, e7119aa,
  ca63cda by @shakkernerd).

Fixes:
- Matrix `verify confirm-sas` cross-signing close (6dda69a by
  @nklock; openclaw#74542).
- `openclaw status` channel context-window overrides (1b3a248 by
  @HemantSudarshan).
- Sandbox Docker daemon graceful when sandbox mode is off (0336aa2
  by @kaseonedge; openclaw#73671).
- Control UI mobile chat settings persisted via Lit state (25b2db8
  by @BunsDev).

Skipped Peter-only commits with no external collaborator (per the
maintainer-attribution rule against thanking @steipete) and the model
list auth-index series (already covered by the existing "Models/UI:
hide unauthenticated providers" entry).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations docker Docker and sandbox tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants