Description
Description
When Docker is stopped while a healthy NemoClaw sandbox exists, status/connect/logs do not surface the expected actionable gateway-down recovery message. The sandbox is reported as stuck in Provisioning and connect waits until timeout, which makes a transient Docker outage look like a sandbox rebuild problem.Environment
Device: Ubuntu server ipp2-0296 (10.176.178.242)
OS: Ubuntu 24.04.4 LTS
Architecture: x86_64
Node.js: v22.22.3
npm: 10.9.8
Docker: Docker version 29.1.3, build 29.1.3-0ubuntu3~24.04.2
OpenShell CLI: openshell 0.0.44
NemoClaw: nemoclaw v0.0.53
OpenClaw: v2026.5.22Steps to Reproduce
- Start from a healthy sandbox named v053-baseline.
- Run: nemoclaw v053-baseline status
- Stop Docker: sudo systemctl stop docker.socket docker
- Confirm Docker is down: docker info
- Run: nemoclaw list
- Run: nemoclaw v053-baseline status
- Run: timeout 20 nemoclaw v053-baseline connect Expected Result
During the Docker outage, status/connect/policy-list should report a clearly named gateway/runtime-down condition with actionable recovery guidance. Connect must not hang waiting on a TTY. Recovery should be to restart Docker only; the user should not be directed toward rebuild/destroy/onboard for a transient Docker outage.Actual Result
Docker is confirmed down:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
status prints sandbox metadata with Phase: Provisioning instead of the expected gateway-down classification:
Sandbox: v053-baseline
Provider: nvidia-prod
Inference: healthy (https://integrate.api.nvidia.com/v1/models)
Agent: OpenClaw v2026.5.22
Phase: Provisioning
Sandbox 'v053-baseline' is stuck in 'Provisioning' phase.
Run nemoclaw v053-baseline rebuild --yes to recreate the sandbox (--yes skips the confirmation prompt; workspace state will be preserved).
connect does not return the expected actionable block; the test had to wrap it with timeout:
timeout 20 nemoclaw v053-baseline connect Logs
Relevant markers:
- baseline status was healthy before Docker stop
- docker info failed while Docker was stopped
- status showed Phase: Provisioning
- connect required timeout 20
- logs reported sandbox not ready (phase: Provisioning)
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Policy&Network, NemoClaw_Sandbox |
[NVB#6235632]
Description
Description
When Docker is stopped while a healthy NemoClaw sandbox exists, status/connect/logs do not surface the expected actionable gateway-down recovery message. The sandbox is reported as stuck in Provisioning and connect waits until timeout, which makes a transient Docker outage look like a sandbox rebuild problem.Environment
Device: Ubuntu server ipp2-0296 (10.176.178.242)
OS: Ubuntu 24.04.4 LTS
Architecture: x86_64
Node.js: v22.22.3
npm: 10.9.8
Docker: Docker version 29.1.3, build 29.1.3-0ubuntu3~24.04.2
OpenShell CLI: openshell 0.0.44
NemoClaw: nemoclaw v0.0.53
OpenClaw: v2026.5.22Steps to Reproduce
During the Docker outage, status/connect/policy-list should report a clearly named gateway/runtime-down condition with actionable recovery guidance. Connect must not hang waiting on a TTY. Recovery should be to restart Docker only; the user should not be directed toward rebuild/destroy/onboard for a transient Docker outage.Actual Result
Docker is confirmed down:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
status prints sandbox metadata with Phase: Provisioning instead of the expected gateway-down classification:
Sandbox: v053-baseline
Provider: nvidia-prod
Inference: healthy (https://integrate.api.nvidia.com/v1/models)
Agent: OpenClaw v2026.5.22
Phase: Provisioning
Sandbox 'v053-baseline' is stuck in 'Provisioning' phase.
Run
nemoclaw v053-baseline rebuild --yesto recreate the sandbox (--yes skips the confirmation prompt; workspace state will be preserved).connect does not return the expected actionable block; the test had to wrap it with timeout:
timeout 20 nemoclaw v053-baseline connect Logs
Relevant markers:
Bug Details
[NVB#6235632]