fix(gateway): broadcast idle timeout errors to clients after agent run started by JulyanXu · Pull Request #85176 · openclaw/openclaw

JulyanXu · 2026-05-22T02:06:27Z

Summary

When an LLM idle timeout occurs after the agent has started (e.g., after tool calls), the error is returned as a normal { text, isError: true } payload via the deliver callback — it is NOT thrown. The .then() handler at line 2882 only called emitUserTranscriptUpdate() in the agentRunStarted=true branch, never checking deliveredReplies for error payloads and never calling broadcastChatError. Connected clients received no error feedback — the response silently stopped.

This fix checks deliveredReplies for payloads with isError: true when agentRunStarted=true and !hasBeforeAgentRunGate. If error payloads are found, broadcastChatError is called to notify clients. If not, the existing emitUserTranscriptUpdate() path is preserved.

Closes #84945

Verification

Single file change: src/gateway/server-methods/chat.ts (20 lines added, 5 removed)
broadcastChatError is already used in the .catch() handler at line 2938 with the same signature
deliveredReplies is already filtered for isError in the !agentRunStarted branch (btwReplies)
Normal agent completions (no error payloads) continue through emitUserTranscriptUpdate() as before

Real behavior proof

Behavior addressed: LLM idle timeout after agent start silently drops — clients receive no error event
Real environment tested: Source-level verification against current main
Exact steps: Start agent session, agent makes tool calls, subsequent LLM call times out (120s idle) → after fix, client receives state: "error" chat event with the timeout message
Evidence after fix: broadcastChatError fires for timeout errors in the agent-started path
What was not tested: Live gateway with idle timeout triggered

…n started When an LLM idle timeout occurs after the agent has started (e.g., after tool calls), the error is returned as a normal payload with isError:true rather than thrown. The .then() handler only called emitUserTranscriptUpdate() in the agent-started branch, never checking deliveredReplies for error payloads and never calling broadcastChatError. Connected clients received no error feedback. Now checks deliveredReplies for isError payloads when agentRunStarted=true and calls broadcastChatError when found. Closes openclaw#84945

clawsweeper · 2026-05-22T02:08:40Z

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch changes chat.send post-dispatch handling to broadcast a chat error when an agent-started run returns delivered replies with isError: true.

Reproducibility: yes. for source-level reproduction: make dispatchInboundMessage call onAgentRunStart and then deliver a final isError payload; current main reaches only the transcript-update branch. I did not run a live gateway idle-timeout scenario in this read-only review.

PR rating
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦐 gold shrimp
Summary: The PR is a useful narrow fix attempt, but missing real behavior proof and the incorrect terminal dedupe state make it not merge-ready yet.

Rank-up moves:

Update the error-payload path to cache ok: false/status: "error" with the timeout summary.
Add a focused gateway regression test for an agent-started returned isError payload and dedupe status.
Add redacted live output, logs, a terminal screenshot, or a linked artifact showing the after-fix gateway error event in a real run.

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: The PR body only describes source-level verification and says no live gateway idle-timeout run was tested, so external-contributor real behavior proof is still needed. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

The new error broadcast still leaves chat:<runId> cached as ok: true/status: "ok", so duplicate chat.send calls or agent.wait can observe success for the same timed-out run.
The PR body describes source-level verification and explicitly says a live gateway idle-timeout scenario was not tested, so the external real-behavior proof gate is not met.

Maintainer options:

Fix terminal chat dedupe (recommended)
Update the error-payload path so the same run records ok: false and status: "error" with an error summary before merge.
Accept broadcast-only behavior
Maintainers could intentionally merge the visible error event first, but cached retry and wait observers may still see an incorrect successful terminal state until a follow-up lands.

Next step before merge
The PR needs contributor real-behavior proof and a dedupe-state fix before normal maintainer review can proceed; ClawSweeper should not open repair markers while the external proof gate is unmet.

Security
Cleared: The diff only changes gateway chat error-event handling and does not touch secrets, dependencies, workflows, install scripts, or supply-chain surfaces.

Review findings

[P2] Record the returned error in terminal dedupe — src/gateway/server-methods/chat.ts:2890-2895

Review details

Best possible solution:

Merge a fix that broadcasts the agent-started returned error payload and records the matching terminal chat dedupe state as an error, with focused gateway coverage and redacted real runtime proof.

Do we have a high-confidence way to reproduce the issue?

Yes for source-level reproduction: make dispatchInboundMessage call onAgentRunStart and then deliver a final isError payload; current main reaches only the transcript-update branch. I did not run a live gateway idle-timeout scenario in this read-only review.

Is this the best way to solve the issue?

No; the direction is right, but the current patch should also write the terminal chat dedupe entry as an error and add focused coverage. Source-only proof is not enough for this external PR before merge.

Label changes:

add P1: The linked bug affects a real gateway/ACP flow where users lose terminal timeout feedback from a running agent session.
add merge-risk: 🚨 message-delivery: The PR changes chat error delivery but can still cache the failed run as successful for duplicate sends or waiters.
add rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🧂 unranked krab, patch quality is 🦐 gold shrimp, and The PR is a useful narrow fix attempt, but missing real behavior proof and the incorrect terminal dedupe state make it not merge-ready yet.
add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body only describes source-level verification and says no live gateway idle-timeout run was tested, so external-contributor real behavior proof is still needed. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Label justifications:

P1: The linked bug affects a real gateway/ACP flow where users lose terminal timeout feedback from a running agent session.
merge-risk: 🚨 message-delivery: The PR changes chat error delivery but can still cache the failed run as successful for duplicate sends or waiters.
rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🧂 unranked krab, patch quality is 🦐 gold shrimp, and The PR is a useful narrow fix attempt, but missing real behavior proof and the incorrect terminal dedupe state make it not merge-ready yet.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR body only describes source-level verification and says no live gateway idle-timeout run was tested, so external-contributor real behavior proof is still needed. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Full review comments:

[P2] Record the returned error in terminal dedupe — src/gateway/server-methods/chat.ts:2890-2895
This branch broadcasts state: "error" for returned error payloads, but execution then falls through to the unchanged dedupe write, which records ok: true and status: "ok" for the same run. Duplicate chat.send calls and agent.wait read chat:<runId> from this map, so a timed-out run can still be reported as successful after the error event. Please set the dedupe entry to an error state with the timeout summary when errorPayloads.length > 0.
Confidence: 0.91

Overall correctness: patch is incorrect
Overall confidence: 0.91

Acceptance criteria:

node scripts/run-vitest.mjs src/gateway/server-methods/chat.directive-tags.test.ts
git diff --check

What I checked:

Current main skips returned errors after agent start: On current main, deliveredReplies are collected, but once agentRunStarted is true the post-dispatch branch only calls emitUserTranscriptUpdate() and does not inspect isError payloads. (src/gateway/server-methods/chat.ts:2433, c8a35c4645dc)
Timeout path returns an error payload: The embedded runner turns a prompt idle timeout into a returned payload with text and isError: true, matching the linked issue's non-throwing flow. (src/agents/pi-embedded-runner/run.ts:2693, c8a35c4645dc)
PR adds a broadcast without changing terminal dedupe: The PR diff adds the broadcastChatError branch, then falls through to the unchanged terminal setGatewayDedupeEntry block that writes ok: true and status: "ok". (src/gateway/server-methods/chat.ts:2890, 882f7f0cf557)
Cached chat sends reuse dedupe truth: Duplicate chat.send calls return the cached dedupe entry's ok, payload, and error, so an error run cached as ok can later be reported as success. (src/gateway/server-methods/chat.ts:2148, c8a35c4645dc)
Agent wait reads terminal status from the same dedupe map: readTerminalSnapshotFromDedupeEntry treats status: "error" or !entry.ok as the error signal; status: "ok" with ok: true is not an error snapshot. (src/gateway/server-methods/agent-wait-dedupe.ts:134, c8a35c4645dc)
Related issue review already called out dedupe consistency: The linked issue comments include a source-level ClawSweeper review and a follow-up note that the dedupe entry should probably be updated for this error case.

Likely related people:

medns: Current blame for the chat.send post-dispatch branch and terminal dedupe write points to commit 01d95b9, whose GitHub commit author is medns. (role: recent area contributor; confidence: high; commits: 01d95b9757a0; files: src/gateway/server-methods/chat.ts, src/gateway/server-methods/chat.directive-tags.test.ts)
steipete: git log -S agentRunStarted surfaced the earlier inbound dispatch unification commit as part of the chat.send lifecycle history. (role: feature-history contributor; confidence: medium; commits: 2e0a835e0750; files: src/gateway/server-methods/chat.ts)
dutifulbob: git log -S setGatewayDedupeEntry surfaced prior gateway work that bridged chat.send run IDs into agent.wait terminal dedupe behavior. (role: adjacent dedupe contributor; confidence: medium; commits: 61f7cea48bd7; files: src/gateway/server-methods/agent-wait-dedupe.ts, src/gateway/server-methods/chat.ts)
bittoby: Gateway chat delivery history includes prior work on webchat final/media delivery adjacent to the deliveredReplies final-event path. (role: adjacent delivery contributor; confidence: low; commits: c50d7183d64a; files: src/gateway/server-methods/chat.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against c8a35c4645dc.

clawsweeper · 2026-05-22T02:14:29Z

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?

The egg game starts only after the PR passes the real-behavior proof check.
Before that, no creature or rarity is rolled. The treat waits for real proof.
This is still just collectible flavor: proof affects review readiness, not creature quality.

openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 22, 2026

clawsweeper Bot mentioned this pull request May 24, 2026

Feat/fix dashboard timeout error display #85815

Merged

25 tasks

This comment was marked as spam.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gateway): broadcast idle timeout errors to clients after agent run started#85176

fix(gateway): broadcast idle timeout errors to clients after agent run started#85176
JulyanXu wants to merge 1 commit into
openclaw:mainfrom
JulyanXu:fix/chat-broadcast-error-on-idle-timeout

JulyanXu commented May 22, 2026

Uh oh!

clawsweeper Bot commented May 22, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 22, 2026

Uh oh!

This comment was marked as spam.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JulyanXu commented May 22, 2026

Summary

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 22, 2026

Uh oh!

This comment was marked as spam.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 22, 2026 •

edited

Loading