You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#2666 (closed by #3270) fixed the user-visible silent-exit-0 regression: nemoclaw <name> status and nemoclaw list now always produce output and never return exit 0 with empty stdout/stderr. That closes the bug as filed.
The original AC for status also asked for one piece of UX polish that was deliberately split out of the bug fix to keep its scope tight:
Print a clearly delimited block naming the failing layer (Docker daemon up but container exited; or container exit + foreign-port-conflict; or gateway not reachable)
What status prints today in the gateway-failure path is the existing generic message + printGatewayLifecycleHint text. That's actionable, but it doesn't distinguish the three named layers the reporter called out.
Proposal
Add a small failing-layer classifier called from src/lib/actions/sandbox/status.ts and src/lib/actions/sandbox/gateway-state.ts:printGatewayLifecycleHint that prints a layer-named header before the existing actionable hints.
Detect, in order:
docker_unreachable — docker info fails or times out. Daemon down or socket inaccessible.
container_exited_port_conflict — docker ps --filter name=openshell-cluster-nemoclaw shows no running container, docker ps -a shows it exited, AND something is listening on the gateway port (host port held by a foreign process).
container_exited — same as 2 but no foreign listener on the gateway port.
gateway_unreachable — container is running but the gateway API does not respond (current generic case).
For each layer, print a one-line "what's wrong" header followed by the existing recovery hints from printGatewayLifecycleHint.
Implementation notes
New helper src/lib/actions/sandbox/gateway-failure-classifier.ts (or extend gateway-state.ts).
Port probe via Node's net.connect() with a short timeout — works cross-platform (Linux/macOS/WSL) without depending on ss / lsof / netstat.
Container probe via docker ps + docker ps -a with short timeouts; gracefully degrade to unknown if Docker is itself unreachable (already covered by step 1).
Unit-testable in isolation: classifier takes injected runners for docker info, docker ps, port-probe so tests can simulate each layer.
The container name openshell-cluster-nemoclaw is hard-coded in NemoClaw's gateway start path; treat the same string as the fixed probe target. If we ever parameterize the gateway name, classifier can read from the same source.
Background
#2666 (closed by #3270) fixed the user-visible silent-exit-0 regression:
nemoclaw <name> statusandnemoclaw listnow always produce output and never return exit 0 with empty stdout/stderr. That closes the bug as filed.The original AC for
statusalso asked for one piece of UX polish that was deliberately split out of the bug fix to keep its scope tight:What
statusprints today in the gateway-failure path is the existing generic message +printGatewayLifecycleHinttext. That's actionable, but it doesn't distinguish the three named layers the reporter called out.Proposal
Add a small failing-layer classifier called from
src/lib/actions/sandbox/status.tsandsrc/lib/actions/sandbox/gateway-state.ts:printGatewayLifecycleHintthat prints a layer-named header before the existing actionable hints.Detect, in order:
docker_unreachable—docker infofails or times out. Daemon down or socket inaccessible.container_exited_port_conflict—docker ps --filter name=openshell-cluster-nemoclawshows no running container,docker ps -ashows it exited, AND something is listening on the gateway port (host port held by a foreign process).container_exited— same as 2 but no foreign listener on the gateway port.gateway_unreachable— container is running but the gateway API does not respond (current generic case).For each layer, print a one-line "what's wrong" header followed by the existing recovery hints from
printGatewayLifecycleHint.Implementation notes
src/lib/actions/sandbox/gateway-failure-classifier.ts(or extendgateway-state.ts).net.connect()with a short timeout — works cross-platform (Linux/macOS/WSL) without depending onss/lsof/netstat.docker ps+docker ps -awith short timeouts; gracefully degrade tounknownif Docker is itself unreachable (already covered by step 1).docker info,docker ps, port-probe so tests can simulate each layer.test/repro-2666-silent-list-status.test.ts) per layer.Out of scope
openshell-cluster-nemoclawis hard-coded in NemoClaw's gateway start path; treat the same string as the fixed probe target. If we ever parameterize the gateway name, classifier can read from the same source.nemoclaw listdoes not need layer classification — its contract is "always show the registry," which fix(cli): keep status and list output visible when gateway probe fails (#2666) #3270 already delivers.Definition of done
statusprints a clearly-named layer header in each of the four classified states.Surfaced from #2666 / #3270.