Skip to content

fix(gateway): surface resolved chat errors#84953

Closed
zhangguiping-xydt wants to merge 3 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-84945
Closed

fix(gateway): surface resolved chat errors#84953
zhangguiping-xydt wants to merge 3 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-84945

Conversation

@zhangguiping-xydt

@zhangguiping-xydt zhangguiping-xydt commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Broadcast state: "error" when a completed agent run delivered isError reply payloads, covering idle-timeout failures that resolve instead of throw.
  • Preserve the error outcome in the chat dedupe entry and add regression coverage for single and multiple delivered error payloads.
  • Preserve the existing user transcript update await/catch path before broadcasting the terminal chat error.

Fixes #84945

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: A completed gateway/ACP agent run can deliver a resolved reply payload with isError: true (for example LLM idle timeout (120s): no response from model). After this patch, that path emits a terminal chat event with state: "error" instead of leaving the web chat stuck as an in-progress run, and it still emits the user transcript update before the terminal error broadcast.
  • Real environment tested: Local OpenClaw checkout on Linux, branch feat/issue-84945, head c26ee4098bc851a0f1ee5ef3f36888f69b2f3db6, Node via repository scripts. The repro exercised the gateway chat send path with an ACP-style delivered error payload.
  • Exact steps or command run after this patch:
    1. node scripts/run-vitest.mjs src/gateway/server-methods/chat.directive-tags.test.ts
    2. node scripts/run-vitest.mjs src/gateway/server.chat.gateway-server-chat.test.ts
    3. git diff --check
    4. node scripts/check-changed.mjs
  • Evidence after fix (terminal capture / copied live output):
Resolved ACP delivered-error payload replayed through gateway chat send:
  delivered reply payload: { text: "LLM idle timeout (120s): no response from model", isError: true }
  observed terminal chat event: { runId: "idem-agent-error-payload", sessionKey: "main", state: "error", errorMessage: "LLM idle timeout (120s): no response from model" }
  observed event order: ["user-transcript", "chat-error"]

Command output after the patch:
  node scripts/run-vitest.mjs src/gateway/server-methods/chat.directive-tags.test.ts
  Test Files  2 passed (2)
  Tests       154 passed (154)

  node scripts/run-vitest.mjs src/gateway/server.chat.gateway-server-chat.test.ts
  Test Files  1 passed (1)
  Tests       24 passed (24)

  git diff --check
  completed with no output

  node scripts/check-changed.mjs
  conflict markers: pass
  changelog attributions: pass
  typecheck all: pass
  lint: pass
  runtime import cycles: 0 runtime value cycle(s)
  • Observed result after fix: The delivered error payload now produces a visible terminal chat broadcast with state: "error" and the exact idle-timeout message, and the user transcript update is observed before the error broadcast.
  • What was not tested: A live remote provider/network idle timeout against real credentials was not triggered; the local run used the same ACP delivered-error payload shape deterministically to avoid spending external provider quota.

Test plan

  • node scripts/run-vitest.mjs src/gateway/server-methods/chat.directive-tags.test.ts
  • node scripts/run-vitest.mjs src/gateway/server.chat.gateway-server-chat.test.ts
  • git diff --check
  • node scripts/check-changed.mjs

@openclaw-barnacle openclaw-barnacle Bot added app: web-ui App: web-ui gateway Gateway runtime size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR updates gateway chat completion handling so resolved delivered reply payloads marked isError broadcast a terminal chat error and are cached as error outcomes, with gateway regression tests.

Reproducibility: yes. by source inspection. Current main can receive resolved idle-timeout payloads with isError: true, but the agentRunStarted post-dispatch branch only emits the transcript update and caches ok; the linked issue includes matching session-log evidence.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: Strong deterministic proof and a narrow patch leave this as a normal good PR with no blocking findings, plus limited merge risk around terminal chat and dedupe semantics.

Rank-up moves:

  • none
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (live_output): The PR body includes copied after-fix local gateway output showing an ACP-style delivered error payload now emits a terminal chat error and preserves transcript ordering; the live provider timeout gap is stated and reasonable for this deterministic gateway path.

Risk before merge

  • The merge changes terminal chat and dedupe semantics for resolved delivered error payloads, so client-visible message delivery and idempotency replay behavior move from silent ok to terminal error.
  • The PR proof uses a deterministic local ACP-style delivered-error payload; it does not trigger a live remote provider idle timeout with real credentials.

Maintainer options:

  1. Land with targeted gateway coverage (recommended)
    Accept the scoped merge risk because the PR covers the resolved-error source contract, terminal broadcast, transcript ordering, and cached replay path in focused gateway tests.
  2. Require live timeout proof first
    Maintainers can ask for a redacted live-provider idle-timeout run before merge if they want end-to-end provider proof beyond the deterministic delivered-payload replay.

Next step before merge
No repair lane is needed; latest head resolves the concrete cached replay issue and the remaining action is normal maintainer and CI handling.

Security
Cleared: The diff is limited to gateway chat handling and tests, with no dependency, workflow, credential, install script, or supply-chain-sensitive change.

Review details

Best possible solution:

Land this gateway fix after normal maintainer and CI review, keeping the terminal-error handling in the gateway post-dispatch branch and the focused replay coverage with it.

Do we have a high-confidence way to reproduce the issue?

Yes, by source inspection. Current main can receive resolved idle-timeout payloads with isError: true, but the agentRunStarted post-dispatch branch only emits the transcript update and caches ok; the linked issue includes matching session-log evidence.

Is this the best way to solve the issue?

Yes. The gateway post-dispatch branch is the narrow ownership boundary for live chat broadcasts and dedupe replay, and the latest PR head covers both the initial terminal error and cached replay semantics.

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied after-fix local gateway output showing an ACP-style delivered error payload now emits a terminal chat error and preserves transcript ordering; the live provider timeout gap is stated and reasonable for this deterministic gateway path.
  • add rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and Strong deterministic proof and a narrow patch leave this as a normal good PR with no blocking findings, plus limited merge risk around terminal chat and dedupe semantics.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes copied after-fix local gateway output showing an ACP-style delivered error payload now emits a terminal chat error and preserves transcript ordering; the live provider timeout gap is stated and reasonable for this deterministic gateway path.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.
  • remove status: ⏳ waiting on author: Current PR status label is status: 👀 ready for maintainer look.

Label justifications:

  • P1: The PR fixes a user-visible gateway chat failure where model idle timeout errors can be silently dropped after an agent run has started.
  • merge-risk: 🚨 message-delivery: The patch changes whether delivered error payloads produce a terminal chat broadcast instead of no visible client error.
  • merge-risk: 🚨 session-state: The patch changes chat:<runId> dedupe entries from ok to error for resolved delivered-error completions, affecting idempotency replay and terminal wait snapshots.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and Strong deterministic proof and a narrow patch leave this as a normal good PR with no blocking findings, plus limited merge risk around terminal chat and dedupe semantics.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes copied after-fix local gateway output showing an ACP-style delivered error payload now emits a terminal chat error and preserves transcript ordering; the live provider timeout gap is stated and reasonable for this deterministic gateway path.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied after-fix local gateway output showing an ACP-style delivered error payload now emits a terminal chat error and preserves transcript ordering; the live provider timeout gap is stated and reasonable for this deterministic gateway path.

What I checked:

  • Current-main behavior: The current main post-dispatch branch for agentRunStarted only emits a user transcript update when there is no before_agent_run gate, then writes an ok chat:<runId> dedupe entry; it does not inspect delivered isError payloads in this path. (src/gateway/server-methods/chat.ts:2882, b77f36fb1cbd)
  • Resolved timeout contract: The agent runner can return a normal payload array containing { text: timeoutText, isError: true } for idle timeouts, so the gateway must handle resolved error payloads rather than only thrown errors. (src/agents/pi-embedded-runner/run.ts:2707, b77f36fb1cbd)
  • PR implementation: The PR adds buildDeliveredReplyErrorMessage, broadcasts broadcastChatError for agent-run delivered errors, and stores ok: false with an ErrorCodes.UNAVAILABLE error shape for cached replay. (src/gateway/server-methods/chat.ts:383, c26ee4098bbf)
  • PR regression coverage: The new tests assert the terminal chat error, node session broadcast, user-transcript-before-error ordering, cached idempotency replay preserving the timeout message, and joined multiple delivered error payloads. (src/gateway/server-methods/chat.directive-tags.test.ts:761, c26ee4098bbf)
  • Diff hygiene: The PR branch diff against its merge-base with current main only touches the gateway chat handler and its focused test, and git diff --check reported no whitespace errors. (c26ee4098bbf)
  • Feature-history provenance: Current visible blame for the gateway chat post-dispatch branch attributes the existing agentRunStarted/dedupe path to commit 04061bc by Vincent Koc. (src/gateway/server-methods/chat.ts:2637, 04061bc80189)

Likely related people:

  • Vincent Koc: git blame on current main attributes the existing gateway chat post-dispatch and dedupe branch to commit 04061bc. (role: recent area contributor; confidence: medium; commits: 04061bc80189; files: src/gateway/server-methods/chat.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against b77f36fb1cbd.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 21, 2026
@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Brave Diff Drake

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: finds missing screenshots.
Image traits: location CI tidepool; accessory commit compass; palette seafoam, black, and opal; mood celebratory; pose peeking out from the egg shell; shell smooth pearl shell; lighting calm overcast light; background gentle dashboard dots.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Brave Diff Drake in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

zhangguiping-xydt and others added 2 commits May 22, 2026 03:01
Broadcast delivered error payloads from completed agent runs so clients see terminal failures instead of silently stopping.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 21, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 21, 2026
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 21, 2026
@zhangguiping-xydt

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 21, 2026
@steipete

Copy link
Copy Markdown
Contributor

Maintainer proof before landing:

Behavior addressed: resolved gateway chat reply payloads marked isError emit a terminal chat error instead of leaving the run in progress
Real environment tested: current OpenClaw source checkout on main with this PR branch applied, plus GitHub PR checks.
Exact steps or command run after this patch: pnpm test src/gateway/server-methods/chat.directive-tags.test.ts src/gateway/server.chat.gateway-server-chat.test.ts; autoreview --mode branch --base origin/main; gh pr checks 84953 --watch=false.
Evidence after fix: focused tests passed locally, autoreview was clean with no accepted/actionable findings, and GitHub checks report Real behavior proof passing with no failed/pending code checks.
Observed result after fix: changed path behaved as expected in the focused regression lane.
What was not tested: a live remote provider idle timeout was not forced; deterministic ACP-style delivered-error payload proof covers the gateway path

@steipete

Copy link
Copy Markdown
Contributor

Thanks for tracking this down and for the detailed proof.

I'm going to close this as superseded by current main. The same behavior is now covered by #85355 (fix(gateway): broadcast agent-run error payloads), which added the returnedAgentErrorPayloads path in src/gateway/server-methods/chat.ts and records the chat dedupe entry as an error when an agent-run reply payload has isError.

Proof on current main:

  • git log --oneline -n 1 -- src/gateway/server-methods/chat.ts includes 07e61fc847 fix(gateway): broadcast agent-run error payloads (#85355).
  • src/gateway/server-methods/chat.ts now broadcasts the terminal chat error for returned agent error payloads and stores an error dedupe payload.

Appreciate the clean repro and tests here; the fix shape landed through the newer PR, so we do not need to carry this conflicting branch forward.

@steipete steipete closed this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: web-ui App: web-ui gateway Gateway runtime merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM idle timeout error silently dropped when agentRunStarted is true

2 participants