Skip to content

[Ubuntu 24.04][CLI&UX] nemoclaw status/connect EXIT=0 when gateway port held; status omits down state #3386

@zNeill

Description

@zNeill

Description

Description

When the OpenShell gateway container is stopped/removed and its host port 8080 is held by an unrelated process, NemoClaw v0.0.38 reports the failure to the user as text but still exits 0:

  `nemoclaw status` — prints sandbox info + `● cloudflared (stopped)`, no gateway-down message, EXIT=0
  `nemoclaw  connect` — prints "Unable to verify sandbox 'X' against the live OpenShell gateway. Error: × No active gateway. ..." then EXIT=0

This breaks shell-script and CI usage: callers cannot reliably detect the unhealthy state from the exit code. The status output also omits an explicit "gateway: down/unreachable" line so a human reader sees only the sandbox info and `cloudflared (stopped)` and may not realize the gateway itself is offline.

Two related bugs were filed and verified previously:
  6112259 — "[macOS][CLI&UX] nemoclaw status reports 'Inference: healthy' while gateway is down, exits 0" — Fixed 4/29, Verified 5/6 (macOS only)
  6125389 — "[Ubuntu 22.04][CLI&UX] nemoclaw status and list return empty output, exit 0 when container is stopped and gateway port is held" — Fixed 5/10, Verified 5/10

The current observation on Ubuntu 24.04 + NemoClaw v0.0.38 (installed today) is:
  (a) status is NOT empty (so 6125389's "empty output" fix may be in place), BUT
  (b) status output STILL exits 0 in the degraded state (6112259-class behavior), AND
  (c) `nemoclaw connect` ALSO exits 0 after explicitly printing "Error: × No active gateway"

So either (a) the existing fixes didn't cover the connect path, or (b) a regression of 6112259 on Linux is happening, or (c) both.
Environment
Device:        2u1g-x570-1795 (10.63.136.90)
OS:            Ubuntu 24.04.4 LTS
Architecture:  x86_64
NemoClaw:      v0.0.38
OpenShell CLI: openshell 0.0.36
Docker:        29.4.3
Node.js:       v22.22.2
Sandbox:       port-conflict-test (manufactured for the test;
               provider=ollama-local, model=qwen2.5:7b, balanced)
Steps to Reproduce
1. Fresh NemoClaw v0.0.38 install with at least one registered sandbox.
2. Stop and remove the gateway container so port 8080 is free:
     docker stop openshell-cluster-nemoclaw
     docker rm openshell-cluster-nemoclaw
3. Hold port 8080 with an unrelated process:
     sudo python3 -c "import socket,time; s=socket.socket(); \
       s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1); \
       s.bind(('0.0.0.0', 8080)); s.listen(); time.sleep(120)" &
4. Run:
     nemoclaw status; echo EXIT=$?
     nemoclaw  connect; echo EXIT=$?
Expected Result
1. `nemoclaw status` SHOULD exit non-zero (e.g. exit 1) AND include an explicit gateway-down line:
     gateway: down  (port 8080 held by another process; restart docker / kill holder, then run nemoclaw onboard --resume)
2. `nemoclaw  connect` SHOULD exit non-zero (e.g. exit 1) after printing the "Error: × No active gateway" message, so shell-script callers can detect failure.
Actual Result
$ nemoclaw status; echo EXIT=$?

  Sandboxes:
    port-conflict-test * (qwen2.5:7b) :18789

  ● cloudflared  (stopped)

EXIT=0

$ nemoclaw port-conflict-test connect; echo EXIT=$?
  Unable to verify sandbox 'port-conflict-test' against the live OpenShell gateway.
Error:   × No active gateway.
  │ Set one with: openshell gateway select
  │ Or deploy a new gateway: openshell gateway start
  Check `openshell status` and the active gateway, then retry.
EXIT=0

Notes:
  - status output never explicitly says "gateway: down" or similar — only the
    cloudflared subsystem is reported. A reader of the output cannot tell
    whether the gateway is up, slow, or absent.
  - connect output IS visually clear ("Error: × No active gateway. ...
    Check openshell status and the active gateway, then retry.") but the
    exit code does not reflect that error — CI/scripted callers cannot
    branch on it.
Logs
Full T560863 capture:
  /home/lab/day0-automation/20260511/report-T560863.txt

Related fixed bugs (may be incomplete on Linux):
  6112259 — macOS status exit 0 (Fixed/Verified)
  6125389 — Ubuntu 22.04 status empty + exit 0 (Fixed/Verified)
  6122173 — DGX Spark/Ubuntu 24.04 status omits Connected/Inference fields, cloudflared no context (Fixed in main, not in v0.0.38 release)

This new report covers the connect path and an explicit gateway-down line missing from status output on Ubuntu 24.04 + v0.0.38.

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL

[NVB#6166959]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputplatform: ubuntuAffects Ubuntu Linux environments

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions