Skip to content

Brev onboard UI: OpenClaw preflight false red, openshell exec argv, SSE/nginx + cloudflared EOF #2258

@ofunk-nvidia

Description

@ofunk-nvidia

Description

Summary

On Brev / brevlab VMs running the NemoClaw onboard UI (Next.js) behind nginx and cloudflared, we observed:

  1. OpenClaw stays red in the dashboard SERVICES panel even when the gateway is healthy inside the sandbox.
  2. openshell sandbox exec must use openshell sandbox exec -n <NAME> -- <cmd>; exec <name> curl … treats <name> as the remote command and falls back to the last-used sandbox (wrong / hang).
  3. Host 127.0.0.1:18789 requires openshell forward start --background 18789 <sandbox>; preflight should start forward before relying on host probes (exec can hang on some gateways).
  4. Sandbox phase: treat Running like Ready for “agent live” so health fallbacks run.
  5. Preflight: probe GET /health (optional Bearer), not only / (401 while healthy).
  6. Brev public URL: tunnel hostnames may use nemoclaw0- vs legacy openclaw0-; prefer journald-detected hostname; optional BREVLAB_DASHBOARD_PREFIX.
  7. cloudflared unexpected EOF / context canceled on /api/logs (SSE): nginx default proxy_read_timeout; mitigations: long proxy_read_timeout/proxy_send_timeout for location /api/, SSE : ping keep-alives, X-Accel-Buffering: no.

Documentation

  • docs/troubleshooting/brev-onboard-ui-openclaw-dashboard.md (new)
  • Cross-links from docs/deployment/deploy-to-remote-gpu.md and docs/reference/troubleshooting.md

Application / image (may live outside this repo)

  • onboard-ui: app/api/preflight/route.ts, lib/services/openclaw.ts, lib/services/openshell.ts, components/organisms/SystemVisualizer.tsx, app/api/logs/route.ts
  • nemoclaw-nginx.conf: /api/ proxy timeouts

Ask

Confirm long-term home for onboard-ui + nemoclaw-nginx.conf (monorepo vs image-only) and align shipping with Brev launchables.

Reporter: @ofunk-nvidia (Brev VM validation)

Reproduction Steps

Reproduction steps
Provision or use a Brev / brevlab VM with the NemoClaw onboard UI (Next.js on :3000), nginx on 127.0.0.1:80 → Next, and cloudflared exposing the public nemoclaw0-…brevlab.com (or equivalent) URL.

Complete onboarding so at least one sandbox exists (e.g. first row in openshell sandbox list shows Ready or Running).

Do not run openshell forward start manually (simulate a fresh session where host :18789 is not forwarded).

Open the dashboard (/dashboard) and call GET /api/preflight (or use the SERVICES panel).

Observe: OpenClaw stays red (checks.openclaw.ok === false) while gateway / docker / openshell can still be green.

(Optional) Open live logs (client uses EventSource on /api/logs?source=all). Watch cloudflared / VM logs: unexpected EOF or context canceled for requests to /api/logs or /api/agent, with origin http://localhost:80.

Workaround that turns OpenClaw green:
openshell forward start --background 18789
then reload preflight — checks.openclaw becomes ok: true.

Environment

Item Value (example / fill in)
Host Brev launchable VM (nemoclaw-*-inst-* on GCP; Linux 6.x, Ubuntu)
Public URL https://nemoclaw0-<brev-env-id>.brevlab.com (Cloudflare Access in front — sign-in may be required for browser)
Ingress cloudflaredhttp://127.0.0.1:80 (nginx) → Next onboard UI :3000
OpenShell e.g. 0.0.24 (openshell --version)
NemoClaw CLI e.g. 0.0.7 (nemoclaw --version)
Docker e.g. 28.x (docker --version)
Node (onboard UI) e.g. 22.x
Python e.g. 3.10.x
Onboard UI Next.js 16.x (standalone server.js, systemd nemoclaw-onboard-ui.service, PORT=3000)
nginx Site nemoclaw from nemoclaw-nginx.conf/api/:3000, / → OpenClaw :18789 with fallback
Sandboxes openshell sandbox list — at least one Ready / Running; OpenClaw listens inside sandbox on 127.0.0.1:18789
CHAT_UI_URL Set or unset per image; affects baked allowedOrigins / dashboard URL hints
GPU CPU-only variant possible (provision.json variant: cpu)
Paste outputs if useful:
uname -a
openshell --version
nemoclaw --version
docker --version
node --version
cat /etc/nemoclaw/provision.json 2>/dev/null | head -c 2000
curl -sS http://127.0.0.1:3000/api/preflight | jq '.checks, .provision.components'

Debug Output

| Item | Value (example / fill in) |
|------|---------------------------|
| **Host** | Brev launchable VM (`nemoclaw-*-inst-*` on GCP; Linux `6.x`, Ubuntu) |
| **Public URL** | `https://nemoclaw0-<brev-env-id>.brevlab.com` (Cloudflare Access in front — sign-in may be required for browser) |
| **Ingress** | `cloudflared``http://127.0.0.1:80` (nginx) → Next onboard UI `:3000` |
| **OpenShell** | e.g. `0.0.24` (`openshell --version`) |
| **NemoClaw CLI** | e.g. `0.0.7` (`nemoclaw --version`) |
| **Docker** | e.g. `28.x` (`docker --version`) |
| **Node (onboard UI)** | e.g. `22.x` |
| **Python** | e.g. `3.10.x` |
| **Onboard UI** | Next.js **16.x** (standalone `server.js`, systemd `nemoclaw-onboard-ui.service`, `PORT=3000`) |
| **nginx** | Site `nemoclaw` from `nemoclaw-nginx.conf``/api/``:3000`, `/` → OpenClaw `:18789` with fallback |
| **Sandboxes** | `openshell sandbox list` — at least one **Ready** / **Running**; OpenClaw listens **inside** sandbox on `127.0.0.1:18789` |
| **CHAT_UI_URL** | Set or unset per image; affects baked `allowedOrigins` / dashboard URL hints |
| **GPU** | CPU-only variant possible (`provision.json` `variant: cpu`) |
Paste outputs if useful:

uname -a
openshell --version
nemoclaw --version
docker --version
node --version
cat /etc/nemoclaw/provision.json 2>/dev/null | head -c 2000
curl -sS http://127.0.0.1:3000/api/preflight | jq '.checks, .provision.components'

Logs

| Item | Value (example / fill in) |
|------|---------------------------|
| **Host** | Brev launchable VM (`nemoclaw-*-inst-*` on GCP; Linux `6.x`, Ubuntu) |
| **Public URL** | `https://nemoclaw0-<brev-env-id>.brevlab.com` (Cloudflare Access in front — sign-in may be required for browser) |
| **Ingress** | `cloudflared``http://127.0.0.1:80` (nginx) → Next onboard UI `:3000` |
| **OpenShell** | e.g. `0.0.24` (`openshell --version`) |
| **NemoClaw CLI** | e.g. `0.0.7` (`nemoclaw --version`) |
| **Docker** | e.g. `28.x` (`docker --version`) |
| **Node (onboard UI)** | e.g. `22.x` |
| **Python** | e.g. `3.10.x` |
| **Onboard UI** | Next.js **16.x** (standalone `server.js`, systemd `nemoclaw-onboard-ui.service`, `PORT=3000`) |
| **nginx** | Site `nemoclaw` from `nemoclaw-nginx.conf``/api/``:3000`, `/` → OpenClaw `:18789` with fallback |
| **Sandboxes** | `openshell sandbox list` — at least one **Ready** / **Running**; OpenClaw listens **inside** sandbox on `127.0.0.1:18789` |
| **CHAT_UI_URL** | Set or unset per image; affects baked `allowedOrigins` / dashboard URL hints |
| **GPU** | CPU-only variant possible (`provision.json` `variant: cpu`) |
Paste outputs if useful:

uname -a
openshell --version
nemoclaw --version
docker --version
node --version
cat /etc/nemoclaw/provision.json 2>/dev/null | head -c 2000
curl -sS http://127.0.0.1:3000/api/preflight | jq '.checks, .provision.components'

Checklist

  • I confirmed this bug is reproducible
  • I searched existing issues and this is not a duplicate

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: cliCommand line interface, flags, terminal UX, or outputarea: installInstall, setup, prerequisites, or uninstall flowarea: onboardingOnboarding FSM, provider setup, sandbox launch, or first-run flowplatform: brevAffects Brev hosted development environments

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions