Skip to content

fix(tui): resync streaming watchdog after reconnect#74224

Merged
fabianwilliams merged 3 commits intoopenclaw:mainfrom
rogerdigital:fix-69081-tui-streaming-watchdog
Apr 29, 2026
Merged

fix(tui): resync streaming watchdog after reconnect#74224
fabianwilliams merged 3 commits intoopenclaw:mainfrom
rogerdigital:fix-69081-tui-streaming-watchdog

Conversation

@rogerdigital
Copy link
Copy Markdown
Contributor

@rogerdigital rogerdigital commented Apr 29, 2026

Summary

  • treat active-run tool and non-terminal lifecycle events as streaming watchdog proof-of-life
  • pause the watchdog while disconnected, then rearm it on reconnect without dropping a legitimately in-flight activeChatRunId
  • keep reconnect history recovery armed until chat streaming resumes, so terminal lifecycle-only reconnects still reload history when the final chat event was missed
  • reload history when a reconnect-stale run times out so missed finals surface without needing a dummy follow-up message

Fixes #69081.

Validation

  • pnpm test src/tui/tui-event-handlers.test.ts src/tui/tui-session-actions.test.ts src/tui/tui-command-handlers.test.ts src/tui/embedded-backend.test.ts
  • pnpm exec oxfmt --check --threads=1 src/tui/tui-event-handlers.ts src/tui/tui.ts src/tui/tui-event-handlers.test.ts CHANGELOG.md
  • git diff --check

Notes

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 29, 2026

Codex review: needs maintainer review before merge.

Keep this PR open for normal maintainer review. Current main still lacks the TUI reconnect/watchdog resync behavior for #69081, while this PR is a focused implementation candidate with tests and no obvious supply-chain-sensitive changes.

Maintainer follow-up before merge:

Keep this PR open and review it as the focused implementation for #69081. If the approach is sound, land the narrow TUI fix that treats active-run tool and non-terminal lifecycle events as proof-of-life, pauses watchdog timers while disconnected, rearms reconnect recovery without dropping legitimately in-flight runs, and reloads history when reconnect-stale runs time out.

Best possible solution:

Keep this PR open and review it as the focused implementation for #69081. If the approach is sound, land the narrow TUI fix that treats active-run tool and non-terminal lifecycle events as proof-of-life, pauses watchdog timers while disconnected, rearms reconnect recovery without dropping legitimately in-flight runs, and reloads history when reconnect-stale runs time out.

Acceptance criteria:

  • pnpm test src/tui/tui-event-handlers.test.ts src/tui/tui-session-actions.test.ts src/tui/tui-command-handlers.test.ts src/tui/embedded-backend.test.ts
  • pnpm exec oxfmt --check --threads=1 src/tui/tui-event-handlers.ts src/tui/tui.ts src/tui/tui-event-handlers.test.ts CHANGELOG.md
  • git diff --check
  • pnpm check:changed via Testbox before maintainer handoff

What I checked:

  • current_main_delta_only_watchdog: Current main arms the streaming watchdog from the chat delta branch for the active run; tool and lifecycle proof-of-life are separate paths. (src/tui/tui-event-handlers.ts:338, a1197b907524)
  • current_main_quiet_tool_events_do_not_refresh_watchdog: The tool-event branch returns when verbose mode is off before doing any watchdog refresh, matching the long-tool false-idle path in [Bug]: TUI: "streaming" status desyncs from actual run state #69081. (src/tui/tui-event-handlers.ts:439, a1197b907524)
  • current_main_lifecycle_events_do_not_rearm_watchdog: Lifecycle events update activity status for active runs but do not arm, pause, or resync the streaming watchdog. (src/tui/tui-event-handlers.ts:474, a1197b907524)
  • current_main_reconnect_has_no_watchdog_hooks: runTui currently receives only handleChatEvent, handleAgentEvent, and handleBtwEvent; onConnected reloads history and onDisconnected updates status, but neither calls pause/reconnect watchdog helpers. (src/tui/tui.ts:903, a1197b907524)
  • pr_diff_targets_missing_behavior: The latest PR diff adds pauseStreamingWatchdog/reconnectStreamingWatchdog, active-run tool/lifecycle watchdog refreshes, reconnect-stale history recovery, and regression tests for quiet tool events, reconnect pause/rearm, overlapping history reload, stale reconnect runs, and lifecycle-only reconnects. (215f687d69e1)
  • security_review_surface: Security pass found the PR touches only CHANGELOG.md and TUI TypeScript/test files. It does not change workflows, dependencies, lockfiles, package scripts, permissions, secret handling, artifact downloads, or publishing metadata. (215f687d69e1)

Likely related people:

Remaining risk / open question:

Codex review notes: model gpt-5.5, reasoning high; reviewed against a1197b907524.

@rogerdigital rogerdigital marked this pull request as ready for review April 29, 2026 08:59
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR fixes the streaming watchdog in the TUI to survive reconnects: it pauses the watchdog on disconnect, rearms it (preserving activeChatRunId) on reconnect, counts active-run tool events and non-terminal lifecycle events as proof-of-life, and reloads history when a reconnect-stale run times out rather than showing a noisy warning.

Confidence Score: 4/5

Safe to merge; both findings are narrow edge-case P2s that don't affect the common reconnect path.

No P0/P1 findings. Two P2 issues: a possible double loadHistory call when pendingHistoryRefresh and reconnectPendingRunId are both live simultaneously, and a stale activityStatus when reconnectStreamingWatchdog drops a run that is no longer in sessionRuns. Both are unlikely in normal usage and non-corrupting. P2-only ceiling is 4/5.

src/tui/tui-event-handlers.ts — the two edge-case paths in armStreamingWatchdog's timeout and reconnectStreamingWatchdog's stale-run branch.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/tui/tui-event-handlers.ts
Line: 123-128

Comment:
**Potential double `loadHistory` call in reconnect watchdog path**

`flushPendingHistoryRefreshIfIdle()` is called unconditionally on line 123 before the `reconnectPendingRunId` guard on line 124. If `pendingHistoryRefresh` happens to be `true` at this point (e.g. a local BTW run finalized while the reconnect-pending run was still active), both `flushPendingHistoryRefreshIfIdle` and the explicit `void loadHistory?.()` on line 126 will fire, issuing two concurrent history fetches. Consider moving the `reconnectPendingRunId` check before `flushPendingHistoryRefreshIfIdle`, or resetting `pendingHistoryRefresh` inside the reconnect branch to prevent the duplicate call.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/tui/tui-event-handlers.ts
Line: 236-240

Comment:
**`activityStatus` not reset when stale `activeChatRunId` is dropped on reconnect**

`state.activeChatRunId` is set to `null` on line 238 before `clearStaleStreamingRunIfNoTrackedRunRemains()` is called. That helper's first check is `!activeRunId`, so it returns early immediately — leaving `state.activityStatus` stuck at `"streaming"` and `flushPendingHistoryRefreshIfIdle()` never called. After reconnect the TUI can keep displaying a streaming indicator for a run that no longer exists. The fix is to call `setActivityStatus("idle")` (and `flushPendingHistoryRefreshIfIdle()`) explicitly in this branch instead of delegating to a helper that will no-op on the already-nulled field.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(tui): keep reconnect history fallbac..." | Re-trigger Greptile

Comment thread src/tui/tui-event-handlers.ts Outdated
Comment thread src/tui/tui-event-handlers.ts
Copy link
Copy Markdown
Contributor

@fabianwilliams fabianwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Watchdog state-machine fix done correctly: (1) treats active-run tool events as proof-of-life so legitimate long-running tool calls don't flip to false-idle, (2) pause/rearm semantics around reconnect preserve activeChatRunId instead of dropping it, (3) reload-history-on-stale-reconnect-timeout closes the loop on missed final events. Tests cover all four scenarios distinctly. Changelog attribution to @EenvoudJasper is the right shape.

@fabianwilliams fabianwilliams merged commit d4e52f4 into openclaw:main Apr 29, 2026
70 checks passed
@rogerdigital rogerdigital deleted the fix-69081-tui-streaming-watchdog branch April 30, 2026 10:10
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
* fix(tui): resync streaming watchdog after reconnect

* fix(tui): keep reconnect history fallback armed

* fix(tui): tighten reconnect watchdog recovery
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* fix(tui): resync streaming watchdog after reconnect

* fix(tui): keep reconnect history fallback armed

* fix(tui): tighten reconnect watchdog recovery
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: TUI: "streaming" status desyncs from actual run state

2 participants