Description
In Docker-driver mode the sandbox container's HEALTHCHECK runs pgrep --ignore-ancestors -f 'openclaw[ -]gateway' inside the container to verify the gateway process. However, in Docker mode the gateway process (openshell-gateway) runs on the host (launched by the OpenShell CLI on the host), not inside the sandbox container. The pgrep probe always returns empty and exits 1, so the HEALTHCHECK never transitions out of (unhealthy).
The container is marked (unhealthy) from the very first check (30s after start) and stays unhealthy indefinitely. NemoClaw itself correctly reports Phase: Ready since it queries OpenShell state, not Docker health — but any tooling or monitoring that inspects docker ps or the Docker API will surface the container as permanently degraded.
Environment
Device: KVM VM (10.57.211.27, x86_64)
OS: Ubuntu 24.04.4 LTS (kernel 6.17.0-23)
Docker: 29.5.2
OpenShell CLI: 0.0.44 (docker)
NemoClaw: v0.0.53-9-gea10007
OpenClaw: 2026.5.22
Gateway mode: Docker driver (gateway runs as host process)
Steps to Reproduce
- On Ubuntu 24.04 (Docker mode), onboard a fresh sandbox:
NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 \
NVIDIA_API_KEY=<key> nemoclaw onboard --fresh --name my-assistant
- Wait ~90 seconds for the initial healthcheck window to expire.
- Check Docker container health:
docker ps -a --format "{{.Names}}: {{.Status}}" | grep openshell
- Inspect healthcheck logs:
docker inspect <container> --format "{{json .State.Health.Log}}" | python3 -m json.tool
- Manually run the probe inside the container:
docker exec <container> bash -c 'pgrep --ignore-ancestors -f "openclaw[ -]gateway"; echo $?'
- Check that the gateway is actually running on the host:
pgrep -a openshell-gateway # non-empty on host
Expected Result
The Docker HEALTHCHECK should pass when the sandbox is functional. In Docker mode the probe should check a service that runs inside the container (e.g. the OpenClaw gateway HTTP endpoint at http://127.0.0.1:<port>/health inside the container's network namespace), or be disabled entirely for Docker-driver deployments where the gateway is a host-side process.
Actual Result
openshell-my-assistant-dfd9ebfe-...: Up 5 minutes (unhealthy)
Healthcheck log — all 5 entries:
{"ExitCode":1,"Output":""} (repeated every 30s)
docker exec <container> pgrep -f "openclaw[ -]gateway"
→ (empty — gateway not in container, exits 1)
pgrep -a openshell-gateway → PID present on host
nemoclaw my-assistant status → Phase: Ready ✓
docker ps → (unhealthy) ✗
Suggested Fix
In the Dockerfile HEALTHCHECK (or the container startup script that configures it), detect or parameterize the gateway mode:
- Docker-driver mode: probe the in-container OpenClaw HTTP endpoint directly (e.g.
curl -sf http://127.0.0.1:18789/health) rather than looking for an external process. The endpoint is reachable inside the container's network namespace because the port is forwarded.
- Alternatively, skip the
pgrep branch in Docker mode and rely solely on the /tmp/gateway.log non-empty check plus the HTTP probe.
The shouldUseContainerizedGateway flag in src/lib/onboard/docker-driver-gateway-launch.ts already encodes the Docker-vs-k3s distinction at runtime; the same condition should drive which HEALTHCHECK variant is embedded in the image.
NVB#6240502
Description
In Docker-driver mode the sandbox container's HEALTHCHECK runs
pgrep --ignore-ancestors -f 'openclaw[ -]gateway'inside the container to verify the gateway process. However, in Docker mode the gateway process (openshell-gateway) runs on the host (launched by the OpenShell CLI on the host), not inside the sandbox container. Thepgrepprobe always returns empty and exits 1, so the HEALTHCHECK never transitions out of(unhealthy).The container is marked
(unhealthy)from the very first check (30s after start) and stays unhealthy indefinitely. NemoClaw itself correctly reportsPhase: Readysince it queries OpenShell state, not Docker health — but any tooling or monitoring that inspectsdocker psor the Docker API will surface the container as permanently degraded.Environment
Steps to Reproduce
pgrep -a openshell-gateway # non-empty on hostExpected Result
The Docker HEALTHCHECK should pass when the sandbox is functional. In Docker mode the probe should check a service that runs inside the container (e.g. the OpenClaw gateway HTTP endpoint at
http://127.0.0.1:<port>/healthinside the container's network namespace), or be disabled entirely for Docker-driver deployments where the gateway is a host-side process.Actual Result
Suggested Fix
In the Dockerfile HEALTHCHECK (or the container startup script that configures it), detect or parameterize the gateway mode:
curl -sf http://127.0.0.1:18789/health) rather than looking for an external process. The endpoint is reachable inside the container's network namespace because the port is forwarded.pgrepbranch in Docker mode and rely solely on the/tmp/gateway.lognon-empty check plus the HTTP probe.The
shouldUseContainerizedGatewayflag insrc/lib/onboard/docker-driver-gateway-launch.tsalready encodes the Docker-vs-k3s distinction at runtime; the same condition should drive which HEALTHCHECK variant is embedded in the image.NVB#6240502