Description
Description
When the OpenShell gateway container is stopped/removed and its host port 8080 is held by an unrelated process, NemoClaw v0.0.38 reports the failure to the user as text but still exits 0:
`nemoclaw status` — prints sandbox info + `● cloudflared (stopped)`, no gateway-down message, EXIT=0
`nemoclaw connect` — prints "Unable to verify sandbox 'X' against the live OpenShell gateway. Error: × No active gateway. ..." then EXIT=0
This breaks shell-script and CI usage: callers cannot reliably detect the unhealthy state from the exit code. The status output also omits an explicit "gateway: down/unreachable" line so a human reader sees only the sandbox info and `cloudflared (stopped)` and may not realize the gateway itself is offline.
Two related bugs were filed and verified previously:
6112259 — "[macOS][CLI&UX] nemoclaw status reports 'Inference: healthy' while gateway is down, exits 0" — Fixed 4/29, Verified 5/6 (macOS only)
6125389 — "[Ubuntu 22.04][CLI&UX] nemoclaw status and list return empty output, exit 0 when container is stopped and gateway port is held" — Fixed 5/10, Verified 5/10
The current observation on Ubuntu 24.04 + NemoClaw v0.0.38 (installed today) is:
(a) status is NOT empty (so 6125389's "empty output" fix may be in place), BUT
(b) status output STILL exits 0 in the degraded state (6112259-class behavior), AND
(c) `nemoclaw connect` ALSO exits 0 after explicitly printing "Error: × No active gateway"
So either (a) the existing fixes didn't cover the connect path, or (b) a regression of 6112259 on Linux is happening, or (c) both.
Environment
Device: 2u1g-x570-1795 (10.63.136.90)
OS: Ubuntu 24.04.4 LTS
Architecture: x86_64
NemoClaw: v0.0.38
OpenShell CLI: openshell 0.0.36
Docker: 29.4.3
Node.js: v22.22.2
Sandbox: port-conflict-test (manufactured for the test;
provider=ollama-local, model=qwen2.5:7b, balanced)
Steps to Reproduce
1. Fresh NemoClaw v0.0.38 install with at least one registered sandbox.
2. Stop and remove the gateway container so port 8080 is free:
docker stop openshell-cluster-nemoclaw
docker rm openshell-cluster-nemoclaw
3. Hold port 8080 with an unrelated process:
sudo python3 -c "import socket,time; s=socket.socket(); \
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1); \
s.bind(('0.0.0.0', 8080)); s.listen(); time.sleep(120)" &
4. Run:
nemoclaw status; echo EXIT=$?
nemoclaw connect; echo EXIT=$?
Expected Result
1. `nemoclaw status` SHOULD exit non-zero (e.g. exit 1) AND include an explicit gateway-down line:
gateway: down (port 8080 held by another process; restart docker / kill holder, then run nemoclaw onboard --resume)
2. `nemoclaw connect` SHOULD exit non-zero (e.g. exit 1) after printing the "Error: × No active gateway" message, so shell-script callers can detect failure.
Actual Result
$ nemoclaw status; echo EXIT=$?
Sandboxes:
port-conflict-test * (qwen2.5:7b) :18789
● cloudflared (stopped)
EXIT=0
$ nemoclaw port-conflict-test connect; echo EXIT=$?
Unable to verify sandbox 'port-conflict-test' against the live OpenShell gateway.
Error: × No active gateway.
│ Set one with: openshell gateway select
│ Or deploy a new gateway: openshell gateway start
Check `openshell status` and the active gateway, then retry.
EXIT=0
Notes:
- status output never explicitly says "gateway: down" or similar — only the
cloudflared subsystem is reported. A reader of the output cannot tell
whether the gateway is up, slow, or absent.
- connect output IS visually clear ("Error: × No active gateway. ...
Check openshell status and the active gateway, then retry.") but the
exit code does not reflect that error — CI/scripted callers cannot
branch on it.
Logs
Full T560863 capture:
/home/lab/day0-automation/20260511/report-T560863.txt
Related fixed bugs (may be incomplete on Linux):
6112259 — macOS status exit 0 (Fixed/Verified)
6125389 — Ubuntu 22.04 status empty + exit 0 (Fixed/Verified)
6122173 — DGX Spark/Ubuntu 24.04 status omits Connected/Inference fields, cloudflared no context (Fixed in main, not in v0.0.38 release)
This new report covers the connect path and an explicit gateway-down line missing from status output on Ubuntu 24.04 + v0.0.38.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL |
[NVB#6166959]
Description
Description
EnvironmentDevice: 2u1g-x570-1795 (10.63.136.90) OS: Ubuntu 24.04.4 LTS Architecture: x86_64 NemoClaw: v0.0.38 OpenShell CLI: openshell 0.0.36 Docker: 29.4.3 Node.js: v22.22.2 Sandbox: port-conflict-test (manufactured for the test; provider=ollama-local, model=qwen2.5:7b, balanced)Steps to Reproduce1. Fresh NemoClaw v0.0.38 install with at least one registered sandbox. 2. Stop and remove the gateway container so port 8080 is free: docker stop openshell-cluster-nemoclaw docker rm openshell-cluster-nemoclaw 3. Hold port 8080 with an unrelated process: sudo python3 -c "import socket,time; s=socket.socket(); \ s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1); \ s.bind(('0.0.0.0', 8080)); s.listen(); time.sleep(120)" & 4. Run: nemoclaw status; echo EXIT=$? nemoclaw connect; echo EXIT=$?Expected Result1. `nemoclaw status` SHOULD exit non-zero (e.g. exit 1) AND include an explicit gateway-down line: gateway: down (port 8080 held by another process; restart docker / kill holder, then run nemoclaw onboard --resume) 2. `nemoclaw connect` SHOULD exit non-zero (e.g. exit 1) after printing the "Error: × No active gateway" message, so shell-script callers can detect failure.Actual Result$ nemoclaw status; echo EXIT=$? Sandboxes: port-conflict-test * (qwen2.5:7b) :18789 ● cloudflared (stopped) EXIT=0 $ nemoclaw port-conflict-test connect; echo EXIT=$? Unable to verify sandbox 'port-conflict-test' against the live OpenShell gateway. Error: × No active gateway. │ Set one with: openshell gateway select │ Or deploy a new gateway: openshell gateway start Check `openshell status` and the active gateway, then retry. EXIT=0 Notes: - status output never explicitly says "gateway: down" or similar — only the cloudflared subsystem is reported. A reader of the output cannot tell whether the gateway is up, slow, or absent. - connect output IS visually clear ("Error: × No active gateway. ... Check openshell status and the active gateway, then retry.") but the exit code does not reflect that error — CI/scripted callers cannot branch on it.LogsBug Details
[NVB#6166959]