Feat/fix dashboard timeout error display#85815
Conversation
|
Codex review: needs maintainer review before merge. Latest ClawSweeper review: 2026-05-24 01:01 UTC / May 23, 2026, 9:01 PM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
PR Surface View PR surface stats
Summary Reproducibility: yes. Current main can throw synchronously after PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Merge the narrow gateway broadcast fix after required CI and maintainer review, while leaving the separate agent-started timeout path to its own PR. Do we have a high-confidence way to reproduce the issue? Yes. Current main can throw synchronously after Is this the best way to solve the issue? Yes. Mirroring the existing async Label changes:
Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against acf265d4d51d. |
|
ClawSweeper PR egg ✨ Hatched: 🌱 uncommon Sunspot Signal Puff Hatch commandComment Hatchability rules:
Rarity: 🌱 uncommon. What is this egg doing here?
|
… barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815.
|
Fixed the P1 finding: changed the test type import to the local server-methods barrel ( Verified locally with the exact command from the review: Result: Pushed as @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
e7d953f to
6ad1283
Compare
… barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815.
… barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815.
6ad1283 to
b677b72
Compare
|
Landed. Proof before merge: focused gateway chat error-broadcast regression test, autoreview clean, and live CI green. Merge commit: a668982 Thanks @scotthuang. |
|
@clawsweeper hatch |
|
🦞👀 Reason: hatch requires an open pull request. |
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
…UI error (#4437) ## Release target Refs #4434. This PR targets `v0.0.55`; #4434 should remain open until this OpenClaw upgrade is merged, tagged, and verified in the shipped `.55` release. ## Why this resolves #4434 NemoClaw #4434 reports that `openclaw tui` keeps an active spinner and `connected` status with no visible terminal error when the NVIDIA inference endpoint is unreachable. This branch moves the sandbox OpenClaw pin from `2026.5.22` to `2026.5.27` with npm integrity: `sha512-2N93zhdAo88KAbHt6T7KvYXf4s7XIkYXBgv1npYpn7e1Y9FvrtgtpsA38my9rtFW+70uXEojRPX5/OqnuDqJPw==` Upstream proof: - openclaw/openclaw#85815 and openclaw/openclaw@a668982 fix the missing `broadcastChatError()` call for synchronous `chat.send` failures. - openclaw/openclaw#84945 and openclaw/openclaw#85355 show the broader real class of gateway errors not being broadcast to clients. ## Changes - Bumps `Dockerfile`, `Dockerfile.base`, `agents/openclaw/manifest.yaml`, and package metadata to OpenClaw `2026.5.27`. - Updates OpenClaw pin/integrity tests, deployment/version tests, and the existing TUI chat-correlation E2E assertion. - Updates `scripts/patch-openclaw-chat-send.js` so NemoClaw's chat-send run-id preservation shim still recognizes the compiled OpenClaw `2026.5.27` followup-runner admission shape. - Adds a CI-safe Vitest contract harness for the #4434 TUI failure signature and expected visible-error behavior. - Adds the privileged live repro: `test/e2e/test-issue-4434-tui-unreachable-inference.sh`. - Wires that live repro into `nightly-e2e.yaml` as `issue-4434-tui-unreachable-inference-e2e`, including selective dispatch, public-install target-ref handling, failure artifacts, aggregate reporting coverage, and trusted workflow-script checkout for the secret/sudo firewall job. ## Local validation - `npm ci` - `npm ci --include=dev` - `npm run build:cli` - `npm run typecheck:cli` - `npm test -- test/fetch-guard-patch-regression.test.ts test/openclaw-chat-send-patch.test.ts test/openclaw-tui-chat-correlation.test.ts test/issue-4434-tui-unreachable-inference.test.ts` - `npm test -- src/lib/sandbox/version.test.ts src/lib/verify-deployment.test.ts` - `npm test -- test/validate-e2e-coverage.test.ts test/e2e-advisor-dispatch.test.ts test/e2e-script-workflow.test.ts test/issue-4434-tui-unreachable-inference.test.ts nemoclaw/src/package-metadata.test.ts` - `shellcheck test/e2e/test-issue-4434-tui-unreachable-inference.sh` - `bash -n test/e2e/test-issue-4434-tui-unreachable-inference.sh` - `bash -n test/e2e/test-openclaw-tui-chat-correlation.sh` - `NEMOCLAW_ISSUE_4434_LIVE=0 bash test/e2e/test-issue-4434-tui-unreachable-inference.sh` - `git diff --check` - Fresh `npm pack openclaw@2026.5.27` dist smoke with `node scripts/patch-openclaw-chat-send.js "$tmp/package/dist"` - Runtime Docker smoke: `docker build -f Dockerfile --build-arg BASE_IMAGE=ghcr.io/nvidia/nemoclaw/sandbox-base:latest -t nemoclaw-issue4434-openclaw-runtime-smoke:2026-5-27 .` - Runtime image version smoke: `docker run --rm --entrypoint openclaw nemoclaw-issue4434-openclaw-runtime-smoke:2026-5-27 --version` -> `OpenClaw 2026.5.27 (27ae826)` - Base-style OpenClaw install smoke in Docker for the `2026.5.27` npm integrity and install path. - Pre-commit suite on `98e0a763efe0925f26cf89129cd4ab63cb0b05f3`: passed, including CLI/plugin coverage hooks. - Pre-push suite reran CLI/plugin coverage; one unrelated `test/nemoclaw-start.test.ts` case timed out during the full concurrent run, then passed directly with `npx vitest run --project cli test/nemoclaw-start.test.ts -t "captures baseline snapshot when openclaw.json is valid and no baseline exists"`. ## Nightly proof Targeted nightly E2E passed on the final PR head: - Run: https://github.com/NVIDIA/NemoClaw/actions/runs/26586935610 - Job: https://github.com/NVIDIA/NemoClaw/actions/runs/26586935610/job/78335355241 - Head: `5f549f661fe81b485f75903146512af4225d4698` - Job: `issue-4434-tui-unreachable-inference-e2e` - Duration: 8m27s The live job runs the requested end-to-end flow on Linux with the repository `NVIDIA_API_KEY` secret: public install from this PR ref, cloud onboard with NVIDIA Endpoints and `nvidia/nemotron-3-super-120b-a12b`, pre-block `nemoclaw <sandbox> status`, pre-block `nemoclaw <sandbox> connect --probe-only`, exact `DOCKER-USER` `DROP` rules for `75.2.113.119` and `99.83.136.103`, in-sandbox endpoint-block verification, `openclaw tui`, `hello`, and final TUI assertion. The passing assertion was: `PASS: openclaw tui surfaced a visible unreachable-inference error and stopped the spinner` The dispatch command for reruns while this job only exists on the PR branch is: ```bash gh workflow run nightly-e2e.yaml --repo NVIDIA/NemoClaw \ --ref issue-4434-openclaw-2026-5-27-proof \ -f target_ref=5f549f661fe81b485f75903146512af4225d4698 \ -f pr_number=4437 \ -f jobs=issue-4434-tui-unreachable-inference-e2e ``` ## Remaining release note - Baseline: #4434 already captures the `v0.0.53` / OpenClaw `2026.5.22` spinner/no-error behavior after the exact firewall block. I did not rerun the mutating baseline repro from this macOS host. - Exact `Dockerfile.base` build was blocked locally because this Docker install does not provide `docker buildx`, while `Dockerfile.base` uses BuildKit `RUN --mount`. The runtime Docker path and a base-style OpenClaw install smoke both passed. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added an opt-in live E2E repro and new unit/integration tests for TUI behavior when inference endpoints are unreachable, validating visible error reporting, spinner shutdown, and compatibility with updated runtime/followup-runner shapes. * **Chores** * Bumped OpenClaw/runtime to 2026.5.27 across builds, manifests, docs, and test expectations. * **Chores / CI** * Added a selective/nightly E2E job to run the repro, include its results in aggregated reports, and upload sanitized logs with sensitive tokens redacted. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: cjagwani <cjagwani@nvidia.com>
* fix(gateway): broadcast error to UI when chat.send fails synchronously * test(gateway): verify broadcastChatError is called on chat.send error * test(gateway): import GatewayRequestContext from local server-methods barrel Fixes the chat error-broadcast regression test so it can resolve its type import. The previous `../types.js` path does not exist in the gateway tree; the shared types are re-exported from `src/gateway/server-methods/types.ts`, so the test must use `./types.js`. Addresses ClawSweeper review on PR openclaw#85815. --------- Co-authored-by: scotthuang <scotthuang@tencent.com>
Summary
chat.sendRPC handler encounters a synchronous error (e.g., LLM timeout), thecatch (err)block only callsrespond()to return an RPC error, but does NOT callbroadcastChatError(). Dashboard UI relies on WebSocketchatevents (state: "error") to update the interface, so it gets stuck with a loading spinner and no error message.broadcastChatError()call in the synchronouscatch (err)block (line 3158) ofchat.sendhandler, mirroring the async.catch()path (line 3117) that already had this call.chat.sendwas fixed.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
catch (err)block inchat.sendhandler (line 3158) was missing thebroadcastChatError()call that sends WebSocket events to Dashboard UI. Onlyrespond()was called, which returns an RPC error but doesn't update the UI state.chatevents withstate: "error"to display errors. WithoutbroadcastChatError(), the UI never receives the error event and stays in loading state forever.Regression Test Plan (if applicable)
src/gateway/server-methods/chat.error-broadcast.test.tschat.sendencounters a synchronous error (e.g., session load failure),broadcastChatError()must be called so Dashboard UI receives the error event.src/gateway/server-methods/chat.directive-tags.test.ts(similar patterns exist)User-visible / Behavior Changes
chat.sendfails synchronously, instead of hanging with a loading spinner forever.Diagram (if applicable)
Security Impact (required)
No)No)No)No)No)Yes, explain risk + mitigation: N/ARepro + Verification
Environment
Steps
http://localhost:19001timeout-test(triggers injected error for testing)Expected
chatevent withstate: "error"anderrorMessageActual
Evidence
Attach at least one:
Video: Before fix (UI stuck with loading spinner)
2026-05-24.3.55.49.mov
Before fix (from gateway.log):
UI hangs, no error displayed.
Video: After fix (error message displayed correctly)
2026-05-24.3.43.11.mov
After fix:
UI shows error message correctly.
Real behavior proof
chat.sendfails synchronously (e.g., LLM timeout), instead of hanging with a loading spinner forever.pnpm build→node dist/index.js gateway --port 19001), model hy3-preview via tencent-tokenhub, Dashboard UI athttp://localhost:19001.pnpm buildopenclaw-dev restart(port 19001, state dir~/.openclaw-dev/)http://localhost:19001hello timeout-test(triggers injected timeout error inchat.sendhandler)broadcastChatError()was called (via WebSocketchatevent withstate: "error"). Test filesrc/gateway/server-methods/chat.error-broadcast.test.tspasses.chatevent withstate: "error"anderrorMessage: "Error: LLM timeout". Loading spinner stops. Error message appears in chat. Test passes:broadcastChatError()is called when synchronous error occurs inchat.send.dashboard:session key).Human Verification (required)
What you personally verified (not just CI), and how:
timeout-testmessage in Dashboard UI athttp://localhost:19001, confirmed error message appears instead of hanging. Verified WebSocket message format matches expected structure (runId,sessionKey,state: "error",errorMessage). Ran unit test:pnpm test src/gateway/server-methods/chat.error-broadcast.test.ts.broadcastChatError()is called with correct parameters (runId,sessionKey,errorMessage).broadcastChatError()calls, real LLM timeout scenario (only tested with injected error), performance impact of additional broadcast call.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes)No)No)Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.broadcastChatError()might be called twice if both sync and async paths trigger (e.g., error in sync path followed by async cleanup error).broadcastChatError()is idempotent for the samerunIddue toagentRunSeqtracking and dedup. The WebSocket clients handle duplicate events gracefully.chat.sendfailures, not just timeouts.