Skip to content

fix: don't broadcast state:error on per-attempt lifecycle errors#60043

Merged
vincentkoc merged 1 commit intoopenclaw:mainfrom
jwchmodx:fix/tui-fallback-output-loss
Apr 6, 2026
Merged

fix: don't broadcast state:error on per-attempt lifecycle errors#60043
vincentkoc merged 1 commit intoopenclaw:mainfrom
jwchmodx:fix/tui-fallback-output-loss

Conversation

@jwchmodx
Copy link
Copy Markdown
Contributor

@jwchmodx jwchmodx commented Apr 3, 2026

Summary

  • When the embedded PI model fails (stopReason === "error"), handleAgentEnd emits a lifecycle phase: "error" per attempt — before any fallback model is tried
  • server-chat.ts was translating this into a state: "error" chat event immediately, causing the TUI to call terminateRun + maybeRefreshHistoryForRunchatLog.clearAll()
  • If a fallback model then succeeded, the TUI had already wiped the chat log via the async history reload, so the fallback's output was lost

Fix: on lifecycle phase: "error", only reset the streaming buffer so the next attempt starts clean. Don't broadcast state: "error" and don't clear the run context. The terminal state: "error" is already sent by broadcastChatError in chat.ts when the entire command fails — that's the correct signal for the TUI to terminate the run.

The run context is kept alive across per-attempt errors so fallback attempts can still resolve the session key. clearAgentRunContext is deferred to agent-command.ts's finally block, which runs after all attempts complete.

Fixes #59570

Test plan

  • New test: does not emit state:error on per-attempt lifecycle error — fallback can still succeed — verifies primary error + fallback success produces only delta/final, no error event, and the final payload carries the fallback's text
  • New test: preserves run context across per-attempt errors so fallback events resolve session key — verifies the run context survives a per-attempt error
  • All existing server-chat.agent-events.test.ts tests pass (31 total)
  • All TUI tests pass (217 total)

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 3, 2026

Greptile Summary

This PR fixes a race condition where a per-attempt phase: \"error\" lifecycle event was being translated into a state: \"error\" chat broadcast, causing the TUI to terminate the run and wipe the chat log before any fallback model had a chance to complete. The fix defers that terminal signal to broadcastChatError (which fires only when all attempts fail) and instead just resets the streaming buffer so fallback attempts start clean. The PR also bundles an unrelated enhancement that strips <relevant-memories> tags injected by the memory plugin from user messages in both the gateway sanitizer (chat-sanitize.ts) and the UI chat extract (message-extract.ts), reusing the already-existing stripRelevantMemoriesTags helper after making it public.

Confidence Score: 5/5

Safe to merge — the fix is well-scoped, the two new tests directly verify both key invariants, and the memory-tag stripping enhancement is additive with no behavioral regressions.

No P0 or P1 findings. The core fix correctly separates per-attempt from terminal error signaling, preserves run context for fallbacks, and resets only the streaming buffer. The relevant-memories stripping is a clean additive change with test coverage. All 31 existing agent-event tests still pass per the test plan.

No files require special attention.

Reviews (1): Last reviewed commit: "fix: don't broadcast state:error on per-..." | Re-trigger Greptile

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8330e68eab

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/gateway/server-chat.ts Outdated
Comment thread src/gateway/server-chat.ts Outdated
@vincentkoc vincentkoc self-assigned this Apr 6, 2026
@vincentkoc vincentkoc force-pushed the fix/tui-fallback-output-loss branch from 8330e68 to 53105c3 Compare April 6, 2026 13:46
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: M labels Apr 6, 2026
@vincentkoc vincentkoc force-pushed the fix/tui-fallback-output-loss branch from 53105c3 to ad8812b Compare April 6, 2026 13:47
@vincentkoc vincentkoc merged commit c8298c5 into openclaw:main Apr 6, 2026
9 checks passed
@vincentkoc
Copy link
Copy Markdown
Member

Landed in c8298c5.

Source head was ad8812b.

Thanks @jwchmodx.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad8812bf63

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +656 to +661
const timer = setTimeout(() => {
pendingTerminalLifecycleErrors.delete(evt.runId);
finalizeLifecycleEvent(evt, opts);
}, delayMs);
timer.unref?.();
pendingTerminalLifecycleErrors.set(evt.runId, timer);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cancel delayed lifecycle-error finalizers on server shutdown

The new retry-grace path stores terminal-error finalization in setTimeout, but these timers are not tied to gateway teardown. If a run emits lifecycle:error and the gateway is closed/restarted before the grace window elapses, the timeout still fires and executes finalizeLifecycleEvent, which can emit/persist a stale terminal error and clear run state after shutdown (the close path only unsubscribes listeners, it does not drain these timers). This is a regression introduced by the deferred-error logic and can produce post-shutdown state mutations.

Useful? React with 👍 / 👎.

steipete pushed a commit to leoleedev/openclaw that referenced this pull request Apr 6, 2026
Mlightsnow pushed a commit to Mlightsnow/openclaw that referenced this pull request Apr 6, 2026
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: TUI stops displaying output after fallback succeeds

2 participants