Skip to content

fix(gateway): broadcast agent-run error payloads#85355

Merged
vincentkoc merged 1 commit into
mainfrom
fix/gateway-agent-error-payload-broadcast
May 22, 2026
Merged

fix(gateway): broadcast agent-run error payloads#85355
vincentkoc merged 1 commit into
mainfrom
fix/gateway-agent-error-payload-broadcast

Conversation

@vincentkoc

Copy link
Copy Markdown
Member

Summary

  • broadcast returned isError agent-run payloads as terminal chat error events after an agent has started
  • mark the chat:<runId> dedupe entry failed for those returned error payloads
  • add a regression covering post-agent-start idle-timeout-style error delivery without mirroring assistant transcript entries

Fixes #84945.

Verification

  • node scripts/run-vitest.mjs src/gateway/server-methods/chat.directive-tags.test.ts (2 files, 152 tests passed)
  • node_modules/.bin/oxfmt --check src/gateway/server-methods/chat.ts src/gateway/server-methods/chat.directive-tags.test.ts CHANGELOG.md
  • git diff --check origin/main...HEAD

Real behavior proof

Behavior addressed: returned isError payloads after agentRunStarted=true were persisted/logged but not broadcast as terminal chat errors.
Real environment tested: local Gateway server-method unit harness in a Codex worktree, with dispatchInboundMessage triggering onAgentRunStart and returning a final error payload.
Exact steps or command run after this patch: node scripts/run-vitest.mjs src/gateway/server-methods/chat.directive-tags.test.ts.
Evidence after fix: the new regression observes a chat event with state: "error", errorMessage: "LLM idle timeout (120s): no response from model", and a failed chat:<runId> dedupe snapshot.
Observed result after fix: connected Gateway clients receive the terminal error event while normal agent-run final text still is not mirrored into assistant transcript entries.
What was not tested: a live ACP/Ki-Agents 120s idle timeout run; CI and the focused Gateway harness cover the source-level failure path.

@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime size: S maintainer Maintainer-authored PR labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Latest ClawSweeper review: 2026-05-22 12:47 UTC / May 22, 2026, 8:47 AM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR updates Gateway chat handling so returned isError payloads after an agent run starts are broadcast as terminal chat error events, marks the chat dedupe entry failed, and adds a focused regression plus changelog entry.

Reproducibility: yes. Source-level reproduction is high confidence: current main stores block/final replies in deliveredReplies, then the agentRunStarted branch only updates the user transcript and marks dedupe ok, while the PR harness simulates onAgentRunStart plus a final isError payload.

PR rating
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🐚 platinum hermit
Summary: Focused, reviewable patch with a regression for the source-level failure path and no blocking findings; the remaining confidence gap is live transport proof and duplicate-PR coordination.

Rank-up moves:

What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Not applicable: The external contributor proof gate does not apply to this maintainer-authored PR; the PR body provides focused unit-harness proof but not a live ACP timeout run.

Risk before merge

  • A live ACP/Ki-Agents 120s idle-timeout run was not exercised; the remaining merge risk is whether real connected clients observe exactly the same terminal chat error event as the focused Gateway harness.
  • There are open external PRs for the same linked bug, especially fix(gateway): surface resolved chat errors #84953, so maintainers should avoid landing duplicate competing fixes.

Maintainer options:

  1. Land with focused Gateway proof (recommended)
    If required checks stay green, maintainers can accept the remaining live-timeout proof gap because the patch uses the existing chat error event contract and covers the source-level failure path.
  2. Request live transport proof
    Before merge, maintainers can ask for a short ACP/WebChat smoke that injects or triggers the returned isError payload and shows the connected client receiving the terminal error.
  3. Pick the canonical duplicate branch
    Pause this branch if maintainers prefer the broader existing fix in fix(gateway): surface resolved chat errors #84953, then close or supersede the other PR to avoid drift.

Next step before merge
No automated repair is needed; this protected maintainer PR has no blocking review findings and should proceed through normal maintainer merge/CI handling.

Security
Cleared: No concrete security or supply-chain regression found; the diff touches Gateway event handling, a regression test, and changelog only, with no new dependencies, workflows, permissions, or secret handling.

Review details

Best possible solution:

Land one canonical Gateway fix that broadcasts returned agent-run error payloads, preserves the transcript ownership boundary, and then close the linked issue while superseding duplicate PRs.

Do we have a high-confidence way to reproduce the issue?

Yes. Source-level reproduction is high confidence: current main stores block/final replies in deliveredReplies, then the agentRunStarted branch only updates the user transcript and marks dedupe ok, while the PR harness simulates onAgentRunStart plus a final isError payload.

Is this the best way to solve the issue?

Yes. The PR is the narrow maintainable fix: it inspects delivered error payloads in the existing post-dispatch path, reuses broadcastChatError, updates dedupe consistently, and avoids mirroring normal Pi assistant turns into transcript history.

Label changes:

  • add P1: The PR fixes a real Gateway/ACP workflow where users can lose the terminal error event and see a silently stopped response.
  • add merge-risk: 🚨 message-delivery: The patch changes terminal chat error delivery and dedupe behavior for connected Gateway clients, which focused tests cover but live transport proof has not exercised.
  • add rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🌊 off-meta tidepool, patch quality is 🐚 platinum hermit, and Focused, reviewable patch with a regression for the source-level failure path and no blocking findings; the remaining confidence gap is live transport proof and duplicate-PR coordination.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external contributor proof gate does not apply to this maintainer-authored PR; the PR body provides focused unit-harness proof but not a live ACP timeout run.

Label justifications:

  • P1: The PR fixes a real Gateway/ACP workflow where users can lose the terminal error event and see a silently stopped response.
  • merge-risk: 🚨 message-delivery: The patch changes terminal chat error delivery and dedupe behavior for connected Gateway clients, which focused tests cover but live transport proof has not exercised.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🌊 off-meta tidepool, patch quality is 🐚 platinum hermit, and Focused, reviewable patch with a regression for the source-level failure path and no blocking findings; the remaining confidence gap is live transport proof and duplicate-PR coordination.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external contributor proof gate does not apply to this maintainer-authored PR; the PR body provides focused unit-harness proof but not a live ACP timeout run.

What I checked:

  • Current main drops returned agent-run error payloads: On current main, deliveredReplies records block/final payloads, but once agentRunStarted is true the post-dispatch branch only emits the user transcript update and then writes a successful chat dedupe entry. (src/gateway/server-methods/chat.ts:2815, 111bad106544)
  • Existing protocol supports the intended event: The Gateway protocol already defines ChatErrorEventSchema with state: "error" and optional errorMessage, so the PR uses an existing client-facing event shape rather than adding a new protocol contract. (src/gateway/protocol/schema/logs-chat.ts:121, 111bad106544)
  • Agent timeout path returns an error payload: The Pi embedded runner timeout path returns a normal reply payload with isError: true, matching the issue's non-throwing flow that would bypass the .catch() broadcast path. (src/agents/pi-embedded-runner/run.ts:2775, 111bad106544)
  • PR patch broadcasts and marks dedupe failure: The PR detects returned isError payloads after an agent starts, calls broadcastChatError, and writes a failed chat:<runId> dedupe payload instead of ok: true. (src/gateway/server-methods/chat.ts:2902, a230fcc4cc38)
  • Regression covers terminal error event without transcript mirroring: The new test simulates onAgentRunStart followed by a final isError payload and expects a chat error event, failed dedupe snapshot, user transcript update, and no assistant transcript mirror. (src/gateway/server-methods/chat.directive-tags.test.ts:970, a230fcc4cc38)
  • Maintainer discussion keeps the bug independent: A maintainer comment on the linked issue says not to close it with fix(codex): recover final text after prompt timeout #84993 and identifies this Gateway chat delivery path as the next narrow repair candidate.

Likely related people:

  • Peter Steinberger: Current blame for the Gateway post-dispatch branch, deliveredReplies handling, and the timeout payload return path points to the same recent local-history commit. (role: recent area contributor; confidence: medium; commits: 4ee8a2ac2ea5; files: src/gateway/server-methods/chat.ts, src/gateway/server-methods/chat.directive-tags.test.ts, src/agents/pi-embedded-runner/run.ts)
  • vincentkoc: The live PR and linked issue are assigned to this member account, and the issue discussion includes their maintainer routing note that this Gateway delivery bug is independent of adjacent fixes. (role: current follow-up owner; confidence: medium; commits: a230fcc4cc38; files: src/gateway/server-methods/chat.ts, src/gateway/server-methods/chat.directive-tags.test.ts, CHANGELOG.md)
  • JulyanXu: The issue discussion and open related PRs include source-level analysis and proposed fixes for the same returned-error-payload delivery path. (role: adjacent contributor; confidence: low; files: src/gateway/server-methods/chat.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 111bad106544.

@vincentkoc vincentkoc self-assigned this May 22, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 💎 rare Mossy Merge Sprite

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 💎 rare.
Trait: polishes edge cases.
Image traits: location green-check meadow; accessory lint brush; palette moonlit blue and soft silver; mood celebratory; pose balancing on a branch marker; shell glossy opal shell; lighting bright celebratory glints; background smooth stones and checkmarks.
Share on X: post this hatch
Copy: My PR egg hatched a 💎 rare Mossy Merge Sprite in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@vincentkoc vincentkoc merged commit 07e61fc into main May 22, 2026
143 of 148 checks passed
@vincentkoc vincentkoc deleted the fix/gateway-agent-error-payload-broadcast branch May 22, 2026 12:58
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
cv pushed a commit to NVIDIA/NemoClaw that referenced this pull request Jun 4, 2026
…UI error (#4437)

## Release target

Refs #4434. This PR targets `v0.0.55`; #4434 should remain open until
this OpenClaw upgrade is merged, tagged, and verified in the shipped
`.55` release.

## Why this resolves #4434

NemoClaw #4434 reports that `openclaw tui` keeps an active spinner and
`connected` status with no visible terminal error when the NVIDIA
inference endpoint is unreachable. This branch moves the sandbox
OpenClaw pin from `2026.5.22` to `2026.5.27` with npm integrity:


`sha512-2N93zhdAo88KAbHt6T7KvYXf4s7XIkYXBgv1npYpn7e1Y9FvrtgtpsA38my9rtFW+70uXEojRPX5/OqnuDqJPw==`

Upstream proof:

- openclaw/openclaw#85815 and
openclaw/openclaw@a668982
fix the missing `broadcastChatError()` call for synchronous `chat.send`
failures.
- openclaw/openclaw#84945 and
openclaw/openclaw#85355 show the broader real
class of gateway errors not being broadcast to clients.

## Changes

- Bumps `Dockerfile`, `Dockerfile.base`,
`agents/openclaw/manifest.yaml`, and package metadata to OpenClaw
`2026.5.27`.
- Updates OpenClaw pin/integrity tests, deployment/version tests, and
the existing TUI chat-correlation E2E assertion.
- Updates `scripts/patch-openclaw-chat-send.js` so NemoClaw's chat-send
run-id preservation shim still recognizes the compiled OpenClaw
`2026.5.27` followup-runner admission shape.
- Adds a CI-safe Vitest contract harness for the #4434 TUI failure
signature and expected visible-error behavior.
- Adds the privileged live repro:
`test/e2e/test-issue-4434-tui-unreachable-inference.sh`.
- Wires that live repro into `nightly-e2e.yaml` as
`issue-4434-tui-unreachable-inference-e2e`, including selective
dispatch, public-install target-ref handling, failure artifacts,
aggregate reporting coverage, and trusted workflow-script checkout for
the secret/sudo firewall job.

## Local validation

- `npm ci`
- `npm ci --include=dev`
- `npm run build:cli`
- `npm run typecheck:cli`
- `npm test -- test/fetch-guard-patch-regression.test.ts
test/openclaw-chat-send-patch.test.ts
test/openclaw-tui-chat-correlation.test.ts
test/issue-4434-tui-unreachable-inference.test.ts`
- `npm test -- src/lib/sandbox/version.test.ts
src/lib/verify-deployment.test.ts`
- `npm test -- test/validate-e2e-coverage.test.ts
test/e2e-advisor-dispatch.test.ts test/e2e-script-workflow.test.ts
test/issue-4434-tui-unreachable-inference.test.ts
nemoclaw/src/package-metadata.test.ts`
- `shellcheck test/e2e/test-issue-4434-tui-unreachable-inference.sh`
- `bash -n test/e2e/test-issue-4434-tui-unreachable-inference.sh`
- `bash -n test/e2e/test-openclaw-tui-chat-correlation.sh`
- `NEMOCLAW_ISSUE_4434_LIVE=0 bash
test/e2e/test-issue-4434-tui-unreachable-inference.sh`
- `git diff --check`
- Fresh `npm pack openclaw@2026.5.27` dist smoke with `node
scripts/patch-openclaw-chat-send.js "$tmp/package/dist"`
- Runtime Docker smoke: `docker build -f Dockerfile --build-arg
BASE_IMAGE=ghcr.io/nvidia/nemoclaw/sandbox-base:latest -t
nemoclaw-issue4434-openclaw-runtime-smoke:2026-5-27 .`
- Runtime image version smoke: `docker run --rm --entrypoint openclaw
nemoclaw-issue4434-openclaw-runtime-smoke:2026-5-27 --version` ->
`OpenClaw 2026.5.27 (27ae826)`
- Base-style OpenClaw install smoke in Docker for the `2026.5.27` npm
integrity and install path.
- Pre-commit suite on `98e0a763efe0925f26cf89129cd4ab63cb0b05f3`:
passed, including CLI/plugin coverage hooks.
- Pre-push suite reran CLI/plugin coverage; one unrelated
`test/nemoclaw-start.test.ts` case timed out during the full concurrent
run, then passed directly with `npx vitest run --project cli
test/nemoclaw-start.test.ts -t "captures baseline snapshot when
openclaw.json is valid and no baseline exists"`.

## Nightly proof

Targeted nightly E2E passed on the final PR head:

- Run: https://github.com/NVIDIA/NemoClaw/actions/runs/26586935610
- Job:
https://github.com/NVIDIA/NemoClaw/actions/runs/26586935610/job/78335355241
- Head: `5f549f661fe81b485f75903146512af4225d4698`
- Job: `issue-4434-tui-unreachable-inference-e2e`
- Duration: 8m27s

The live job runs the requested end-to-end flow on Linux with the
repository `NVIDIA_API_KEY` secret: public install from this PR ref,
cloud onboard with NVIDIA Endpoints and
`nvidia/nemotron-3-super-120b-a12b`, pre-block `nemoclaw <sandbox>
status`, pre-block `nemoclaw <sandbox> connect --probe-only`, exact
`DOCKER-USER` `DROP` rules for `75.2.113.119` and `99.83.136.103`,
in-sandbox endpoint-block verification, `openclaw tui`, `hello`, and
final TUI assertion.

The passing assertion was:

`PASS: openclaw tui surfaced a visible unreachable-inference error and
stopped the spinner`

The dispatch command for reruns while this job only exists on the PR
branch is:

```bash
gh workflow run nightly-e2e.yaml --repo NVIDIA/NemoClaw \
  --ref issue-4434-openclaw-2026-5-27-proof \
  -f target_ref=5f549f661fe81b485f75903146512af4225d4698 \
  -f pr_number=4437 \
  -f jobs=issue-4434-tui-unreachable-inference-e2e
```

## Remaining release note

- Baseline: #4434 already captures the `v0.0.53` / OpenClaw `2026.5.22`
spinner/no-error behavior after the exact firewall block. I did not
rerun the mutating baseline repro from this macOS host.
- Exact `Dockerfile.base` build was blocked locally because this Docker
install does not provide `docker buildx`, while `Dockerfile.base` uses
BuildKit `RUN --mount`. The runtime Docker path and a base-style
OpenClaw install smoke both passed.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added an opt-in live E2E repro and new unit/integration tests for TUI
behavior when inference endpoints are unreachable, validating visible
error reporting, spinner shutdown, and compatibility with updated
runtime/followup-runner shapes.

* **Chores**
* Bumped OpenClaw/runtime to 2026.5.27 across builds, manifests, docs,
and test expectations.

* **Chores / CI**
* Added a selective/nightly E2E job to run the repro, include its
results in aggregated reports, and upload sanitized logs with sensitive
tokens redacted.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: cjagwani <cjagwani@nvidia.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: web-ui App: web-ui gateway Gateway runtime maintainer Maintainer-authored PR rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM idle timeout error silently dropped when agentRunStarted is true

1 participant