Skip to content

[Ubuntu 24.04][CLI] Docker restart leaves sandbox stuck in Provisioning and connect times out #4428

@zNeill

Description

@zNeill

Description

Description

When Docker is stopped while a healthy NemoClaw sandbox exists, status/connect/logs do not surface the expected actionable gateway-down recovery message. The sandbox is reported as stuck in Provisioning and connect waits until timeout, which makes a transient Docker outage look like a sandbox rebuild problem.Environment

Device: Ubuntu server ipp2-0296 (10.176.178.242)
OS: Ubuntu 24.04.4 LTS
Architecture: x86_64
Node.js: v22.22.3
npm: 10.9.8
Docker: Docker version 29.1.3, build 29.1.3-0ubuntu3~24.04.2
OpenShell CLI: openshell 0.0.44
NemoClaw: nemoclaw v0.0.53
OpenClaw: v2026.5.22Steps to Reproduce

  1. Start from a healthy sandbox named v053-baseline.
  2. Run: nemoclaw v053-baseline status
  3. Stop Docker: sudo systemctl stop docker.socket docker
  4. Confirm Docker is down: docker info
  5. Run: nemoclaw list
  6. Run: nemoclaw v053-baseline status
  7. Run: timeout 20 nemoclaw v053-baseline connect Expected Result

During the Docker outage, status/connect/policy-list should report a clearly named gateway/runtime-down condition with actionable recovery guidance. Connect must not hang waiting on a TTY. Recovery should be to restart Docker only; the user should not be directed toward rebuild/destroy/onboard for a transient Docker outage.Actual Result

Docker is confirmed down:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

status prints sandbox metadata with Phase: Provisioning instead of the expected gateway-down classification:
Sandbox: v053-baseline
Provider: nvidia-prod
Inference: healthy (https://integrate.api.nvidia.com/v1/models)
Agent: OpenClaw v2026.5.22
Phase: Provisioning
Sandbox 'v053-baseline' is stuck in 'Provisioning' phase.
Run nemoclaw v053-baseline rebuild --yes to recreate the sandbox (--yes skips the confirmation prompt; workspace state will be preserved).

connect does not return the expected actionable block; the test had to wrap it with timeout:
timeout 20 nemoclaw v053-baseline connect Logs

Relevant markers:

  • baseline status was healthy before Docker stop
  • docker info failed while Docker was stopped
  • status showed Phase: Provisioning
  • connect required timeout 20
  • logs reported sandbox not ready (phase: Provisioning)

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Policy&Network, NemoClaw_Sandbox

[NVB#6235632]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputplatform: containerAffects Docker, containerd, Podman, or imagesplatform: ubuntuAffects Ubuntu Linux environments

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions