Skip to content

[NemoClaw][macOS][CLI&UX] nemoclaw status reports "Inference: healthy" while gateway is down, exits 0 #2595

@zNeill

Description

@zNeill

Description

Description

After `docker kill` on the openshell gateway container, `nemoclaw  status`
prints "Inference: healthy" at the top of its output even though the gateway is down
and refusing connections. Recovery-attempt lines and the actual error appear LATER
in the output, after the stale "healthy" line. A user who reads the first screen
is misled into thinking the sandbox is fine. Additionally, the command exits 0 even
when the gateway is unreachable.
Environment
Device:        MacBook Pro (Apple M3 Pro, 36GB)
OS:            macOS Darwin 25.3.0 (arm64)
Architecture:  arm64
Node.js:       v24.5.0
npm:           10.9.7
Docker:        Docker Desktop / colima
OpenShell CLI: openshell 0.0.26
NemoClaw:      v0.0.21
OpenClaw:      2026.4.2 (d74a122)
Steps to Reproduce
1. Have a running sandbox (e.g. `my-assistant`)
2. Run: `docker kill $(docker ps -q --filter name=openshell)`
3. Run: `nemoclaw my-assistant status`
4. Read the output top-to-bottom
Expected Result
1. status identifies the degraded state on the FIRST lines printed
   e.g. "Gateway: offline / connection refused"
2. No "healthy" claim anywhere in the output until the gateway is actually back
3. If auto-recovery is attempted, that should come BEFORE any status line that
   could be read as "everything is fine"
4. status exits non-zero when the gateway cannot be reached
Actual Result
First block of output claims healthy:
  Sandbox: my-assistant
    Model:    nvidia/nemotron-3-super-120b-a12b
    Provider: nvidia-prod
    Inference: healthy (https://integrate.api.nvidia.com/v1/models)   <-- MISLEADS
    GPU:      yes
    Connected: no
    Agent:    OpenClaw v2026.4.2

Auto-recovery block appears AFTER the healthy block:
  ✓ Active gateway set to 'nemoclaw'
  [2/8] Starting OpenShell gateway
  ✓ Reusing existing gateway

Real error block appears LAST:
  Sandbox 'my-assistant' may still exist, but the selected NemoClaw gateway
  is still refusing connections after restart.
  Error:   × client error (Connect)
    ├─▶ tcp connect error
    ╰─▶ Connection refused (os error 61)

EXIT=0  (despite the error)
Logs
$ docker kill $(docker ps -q --filter name=openshell)
5baa33c9c1b6

$ nemoclaw my-assistant status; echo $?

  Sandbox: my-assistant
    Model:    nvidia/nemotron-3-super-120b-a12b
    Provider: nvidia-prod
    Inference: healthy (https://integrate.api.nvidia.com/v1/models)
    ...
✓ Active gateway set to 'nemoclaw'
  [2/8] Starting OpenShell gateway
  ✓ Reusing existing gateway

  Sandbox 'my-assistant' may still exist, but the selected NemoClaw gateway
  is still refusing connections after restart.
  Error:   × client error (Connect)
    ├─▶ tcp connect error
    ╰─▶ Connection refused (os error 61)
0
Notes
- "Inference: healthy" appears to be derived from a probe of the provider endpoint
  (integrate.api.nvidia.com) without first checking that the local gateway is
  reachable. The provider is always "healthy" because the probe goes direct.
- After docker restart, OpenClaw gateway INSIDE the sandbox does NOT auto-restart
  — `connect` then says "OpenClaw gateway is not running inside the sandbox" and
  "Could not restart OpenClaw gateway automatically".

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL

[NVB#6112259]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputplatform: macosAffects macOS, including Apple Silicon

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions