Skip to content

fix(tui): raise streaming watchdog threshold to 120s and suppress false-positive warning#69026

Open
jpruit20 wants to merge 1 commit intoopenclaw:mainfrom
jpruit20:fix/tui-streaming-watchdog-threshold
Open

fix(tui): raise streaming watchdog threshold to 120s and suppress false-positive warning#69026
jpruit20 wants to merge 1 commit intoopenclaw:mainfrom
jpruit20:fix/tui-streaming-watchdog-threshold

Conversation

@jpruit20
Copy link
Copy Markdown

Problem

The TUI streaming watchdog is set to 30 seconds, which triggers false-positive warnings during healthy long-running agent turns:

streaming watchdog: no stream updates for 30s; resetting status. The backend may have dropped this run silently — send a new message to resync.

This fires regularly during:

  • Tool calls that take >30s (browser automation, file operations, code generation)
  • Sub-agent work
  • Long Claude thinking turns
  • Any agent turn where the backend is working but hasn't sent a stream delta

The alarming "backend may have dropped this run" message causes unnecessary user confusion.

Changes

src/tui/tui-event-handlers.ts

  1. Threshold: DEFAULT_STREAMING_WATCHDOG_MS changed from 30_000 (30s) → 120_000 (120s)
  2. Message: chatLog.addSystem() call replaced with a comment — warning suppressed entirely

What's preserved

  • Internal reset behavior: activeChatRunId cleared, activityStatus set to idle, tui.requestRender() called
  • The context.streamingWatchdogMs override still works for custom values
  • Setting streamingWatchdogMs: 0 still disables the watchdog entirely

src/tui/tui-event-handlers.test.ts

  • Updated two test assertions to match the suppressed message behavior
  • All other watchdog tests unchanged (timer behavior, disposal, run isolation)

Rationale

The watchdog's internal reset (clearing stale run state, returning status to idle) is valuable. But the user-facing warning creates more confusion than it solves — users can always send a new message to resync if something is truly stuck, and the 120s threshold means the reset only fires for genuinely stalled connections rather than healthy long operations.

…se-positive warning

The TUI streaming watchdog was set to 30 seconds, which triggers
false-positive 'no stream updates' warnings during healthy long-running
agent turns (tool calls, sub-agent work, code generation, browser
automation). The alarming message ('backend may have dropped this run')
caused unnecessary user confusion.

Changes:
- Raise DEFAULT_STREAMING_WATCHDOG_MS from 30s to 120s
- Suppress the user-visible chatLog.addSystem() warning
- Internal reset behavior is fully preserved (activeChatRunId cleared,
  activityStatus set to idle, render requested)
- The override context.streamingWatchdogMs still works for custom values
- Update tests to match suppressed message behavior
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 19, 2026

Greptile Summary

This PR raises the streaming watchdog timeout from 30 s to 120 s and suppresses the user-visible warning entirely, keeping only the internal state reset (clearing activeChatRunId, setting status to idle, triggering a render). Tests are correctly updated to assert that chatLog.addSystem is no longer called.

The internal mechanics are sound, but completely removing the user-facing message means a genuine silent connection drop after 120 s will also produce no feedback — users will only see the status quietly change to idle with no explanation of why or what to do next.

Confidence Score: 5/5

Safe to merge; the change is intentional and technically correct, with properly updated tests.

All remaining feedback is P2 (UX/product preference around message suppression vs. softer messaging). No correctness, data-integrity, or security issues introduced.

src/tui/tui-event-handlers.ts — watchdog silent-reset UX tradeoff worth discussing before shipping.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/tui/tui-event-handlers.ts
Line: 106-110

Comment:
**Silent reset trades one UX problem for another**

When the watchdog fires after a genuine connection drop (not just a long tool call), the status silently flips from `streaming``idle` with zero indication to the user. They won't know whether their request completed normally or was dropped, and they have no prompt to resync. A softer, lower-anxiety message (e.g., `"Agent turn is taking longer than expected — send a new message if you'd like to resync."`) would avoid the false-positive alarm tone while still surfacing the event. Alternatively, a debug-level log (if the codebase has a logger) would at least preserve diagnosability without any user noise.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(tui): raise streaming watchdog thres..." | Re-trigger Greptile

Comment on lines +106 to +110
/* Watchdog fires silently — the internal reset (clearing activeChatRunId,
setting status idle) still happens, but the user-visible warning is
suppressed to avoid false-positive noise during long-running agent
turns (tool calls, sub-agent work, code generation). The user can
always send a new message to resync if something is truly stuck. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent reset trades one UX problem for another

When the watchdog fires after a genuine connection drop (not just a long tool call), the status silently flips from streamingidle with zero indication to the user. They won't know whether their request completed normally or was dropped, and they have no prompt to resync. A softer, lower-anxiety message (e.g., "Agent turn is taking longer than expected — send a new message if you'd like to resync.") would avoid the false-positive alarm tone while still surfacing the event. Alternatively, a debug-level log (if the codebase has a logger) would at least preserve diagnosability without any user noise.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/tui/tui-event-handlers.ts
Line: 106-110

Comment:
**Silent reset trades one UX problem for another**

When the watchdog fires after a genuine connection drop (not just a long tool call), the status silently flips from `streaming``idle` with zero indication to the user. They won't know whether their request completed normally or was dropped, and they have no prompt to resync. A softer, lower-anxiety message (e.g., `"Agent turn is taking longer than expected — send a new message if you'd like to resync."`) would avoid the false-positive alarm tone while still surfacing the event. Alternatively, a debug-level log (if the codebase has a logger) would at least preserve diagnosability without any user noise.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: found issues before merge.

Summary
The PR raises the TUI streaming watchdog default from 30s to 120s, suppresses the normal user-visible watchdog warning, and updates watchdog tests to expect silent resets.

Reproducibility: yes. from source inspection. Current fake-timer tests send a chat delta, advance past streamingWatchdogMs, and assert both the idle reset and the streaming watchdog system message, with a second test covering repeated messages.

Next step before merge
The changelog gap is mechanical, but the central blocker is maintainer choice on watchdog UX: silent reset versus softer, deduped, or configurable feedback.

Security
Cleared: The diff only touches TUI watchdog TypeScript code and tests, with no security-sensitive workflow, dependency, lockfile, script, secret, package, or artifact handling changes.

Review findings

  • [P3] Add the required changelog entry — src/tui/tui-event-handlers.ts:51
Review details

Best possible solution:

Keep the current proof-of-life and reconnect fixes, then choose a maintainable UX for normal watchdog expirations: softer/deduped feedback or an explicit user-facing threshold setting rather than silently hiding all events by default.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection. Current fake-timer tests send a chat delta, advance past streamingWatchdogMs, and assert both the idle reset and the streaming watchdog system message, with a second test covering repeated messages.

Is this the best way to solve the issue?

Unclear. The patch is narrow, but complete suppression is a maintainer UX choice because it removes diagnostics for genuine silent drops; a softer deduped warning or explicit configuration may be safer.

Full review comments:

  • [P3] Add the required changelog entry — src/tui/tui-event-handlers.ts:51
    This is a user-facing fix(tui) behavior change, but the PR only updates TUI code and tests. OpenClaw policy requires a CHANGELOG.md entry before merging user-facing fix, feature, or perf changes.
    Confidence: 0.88

Overall correctness: patch is correct
Overall confidence: 0.78

Acceptance criteria:

  • pnpm test src/tui/tui-event-handlers.test.ts
  • pnpm exec oxfmt --check --threads=1 src/tui/tui-event-handlers.ts src/tui/tui-event-handlers.test.ts CHANGELOG.md
  • git diff --check

What I checked:

Likely related people:

  • xantorres: The TUI streaming watchdog, including the 30s delta-silence window and user-visible system warning, appears to date to the watchdog feature commit credited to @xantorres. (role: introduced behavior; confidence: high; commits: f44ab20d4db5; files: src/tui/tui-event-handlers.ts, src/tui/tui-event-handlers.test.ts, CHANGELOG.md)
  • obviyus: Authored the follow-up commit keeping the TUI watchdog bound to the active run, which is directly adjacent to the timer/run-ownership semantics touched here. (role: adjacent maintainer; confidence: medium; commits: 352527393079; files: src/tui/tui-event-handlers.ts, src/tui/tui-event-handlers.test.ts)
  • rogerdigital: Authored the recent merged reconnect/proof-of-life watchdog fix fix(tui): resync streaming watchdog after reconnect #74224 that changed the same timer and recovery paths while intentionally leaving the normal warning behavior in place. (role: recent adjacent maintainer; confidence: high; commits: d4e52f454255; files: src/tui/tui-event-handlers.ts, src/tui/tui.ts, src/tui/tui-event-handlers.test.ts)

Remaining risk / open question:

  • Silent normal watchdog resets may hide genuine dropped runs and remove the user's resync guidance.
  • The branch predates later watchdog reconnect/proof-of-life changes, so it needs a rebase and focused validation before merge.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9a49d143c819.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant