fix: don't broadcast state:error on per-attempt lifecycle errors#60043
fix: don't broadcast state:error on per-attempt lifecycle errors#60043vincentkoc merged 1 commit intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a race condition where a per-attempt Confidence Score: 5/5Safe to merge — the fix is well-scoped, the two new tests directly verify both key invariants, and the memory-tag stripping enhancement is additive with no behavioral regressions. No P0 or P1 findings. The core fix correctly separates per-attempt from terminal error signaling, preserves run context for fallbacks, and resets only the streaming buffer. The relevant-memories stripping is a clean additive change with test coverage. All 31 existing agent-event tests still pass per the test plan. No files require special attention. Reviews (1): Last reviewed commit: "fix: don't broadcast state:error on per-..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8330e68eab
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
8330e68 to
53105c3
Compare
53105c3 to
ad8812b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad8812bf63
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const timer = setTimeout(() => { | ||
| pendingTerminalLifecycleErrors.delete(evt.runId); | ||
| finalizeLifecycleEvent(evt, opts); | ||
| }, delayMs); | ||
| timer.unref?.(); | ||
| pendingTerminalLifecycleErrors.set(evt.runId, timer); |
There was a problem hiding this comment.
Cancel delayed lifecycle-error finalizers on server shutdown
The new retry-grace path stores terminal-error finalization in setTimeout, but these timers are not tied to gateway teardown. If a run emits lifecycle:error and the gateway is closed/restarted before the grace window elapses, the timeout still fires and executes finalizeLifecycleEvent, which can emit/persist a stale terminal error and clear run state after shutdown (the close path only unsubscribes listeners, it does not drain these timers). This is a regression introduced by the deferred-error logic and can produce post-shutdown state mutations.
Useful? React with 👍 / 👎.
…nclaw#60043) (thanks @jwchmodx) (openclaw#60043) Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…nclaw#60043) (thanks @jwchmodx) (openclaw#60043) Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…nclaw#60043) (thanks @jwchmodx) (openclaw#60043) Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
…nclaw#60043) (thanks @jwchmodx) (openclaw#60043) Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Summary
handleAgentEndemits a lifecyclephase: "error"per attempt — before any fallback model is triedserver-chat.tswas translating this into astate: "error"chat event immediately, causing the TUI to callterminateRun+maybeRefreshHistoryForRun→chatLog.clearAll()Fix: on lifecycle
phase: "error", only reset the streaming buffer so the next attempt starts clean. Don't broadcaststate: "error"and don't clear the run context. The terminalstate: "error"is already sent bybroadcastChatErrorinchat.tswhen the entire command fails — that's the correct signal for the TUI to terminate the run.The run context is kept alive across per-attempt errors so fallback attempts can still resolve the session key.
clearAgentRunContextis deferred toagent-command.ts'sfinallyblock, which runs after all attempts complete.Fixes #59570
Test plan
does not emit state:error on per-attempt lifecycle error — fallback can still succeed— verifies primary error + fallback success produces only delta/final, no error event, and the final payload carries the fallback's textpreserves run context across per-attempt errors so fallback events resolve session key— verifies the run context survives a per-attempt errorserver-chat.agent-events.test.tstests pass (31 total)🤖 Generated with Claude Code