Skip to content

fix(tui): clear stuck streaming when final has no active run in flight#52745

Closed
lyksdu wants to merge 15 commits intoopenclaw:mainfrom
lyksdu:fix/tui-streaming-orphan-idle
Closed

fix(tui): clear stuck streaming when final has no active run in flight#52745
lyksdu wants to merge 15 commits intoopenclaw:mainfrom
lyksdu:fix/tui-streaming-orphan-idle

Conversation

@lyksdu
Copy link
Copy Markdown

@lyksdu lyksdu commented Mar 23, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: TUI status line can remain on streaming • … | connected after the assistant turn has visibly finished, when the chat final event is processed with wasActiveRun === false (i.e. state.activeChatRunId !== evt.runId) while earlier delta events had already called setActivityStatus("streaming"), and there is no concurrent in-flight run (sessionRuns empty after finalization, activeChatRunId already null).
  • Why it matters: Users and automation that infer “still generating” from the status line (or tmux capture) see false “streaming” for a long time; this shows up with failover, multi-stage tools, and subagent timing (e.g. long agent.wait / LLM timeout then recovery).
  • What changed: finalizeRun and terminateRun now also call setActivityStatus when shouldClearOrphanedActivityStatus() is true (!activeChatRunId && sessionRuns.size === 0), plus two unit tests (orphan clear vs concurrent run preserved).
  • What did NOT change (scope boundary): Gateway protocol, WS payloads, skills/tools, auth, memory, non-TUI UIs, and behavior when another run is still in sessionRuns (concurrent runs still do not get their activity cleared by an inactive final).

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes # (none filed; can link if an issue is opened)
  • Related # (optional: TUI streaming / activeChatRunId / subagent lane discussions)

User-visible / Behavior Changes

  • User-visible: In the edge case above, the TUI footer should return to a non-busy state (e.g. connected | idle) instead of staying on streaming indefinitely when no run remains in flight.
  • Defaults/config: None.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux / macOS (anywhere openclaw tui runs)
  • Runtime/container: Gateway + TUI client as usual
  • Model/provider: Observed with multi-agent + timeouts / model failover (e.g. modelhub); not model-specific
  • Integration/channel (if any): TUI (openclaw-tui)
  • Relevant config (redacted): Default; issue tied to event ordering, not a single config key

Steps

  1. Run a prompt that triggers subagents, LLM timeouts/failover, or long agent.wait (e.g. multi-agent search with slow or failing tools).
  2. Observe TUI footer during and after the assistant finishes rendering the main reply.
  3. (Optional) Compare with logs: final for a run may arrive when activeChatRunId no longer matches that runId, while sessionRuns is already empty.

Expected

  • Footer leaves busy streaming state when no chat run is in flight for the session.

Actual (before fix)

  • Footer can stay streaming • Ns | connected for minutes after content looks complete.

Evidence

Attach at least one:

  • Failing test/log before + passing after (new unit tests cover orphan vs concurrent cases)
  • Trace/log snippets (optional: gateway logs showing agent.wait timeout + failover around stuck UI)
  • Screenshot/recording (optional: TUI footer stuck on streaming)
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: Unit tests in src/tui/tui-event-handlers.test.ts for (1) inactive final clears activity when no runs remain, (2) inactive final does not clear when another run is still in flight.
  • Edge cases checked: Concurrent run-a / run-b path in test harness; explicit activeChatRunId = null before final to simulate pointer loss.
  • What you did not verify: Full interactive TUI regression on every OS; full matrix of gateway failover sequences (recommend maintainer spot-check).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert the commit touching src/tui/tui-event-handlers.ts (and associated tests).
  • Files/config to restore: Prior versions of tui-event-handlers.ts / tui-event-handlers.test.ts.
  • Known bad symptoms reviewers should watch for: If setActivityStatus fires too aggressively, concurrent runs might briefly show idle while another run is still streaming—mitigated by requiring sessionRuns.size === 0 and activeChatRunId === null.

Risks and Mitigations

  • Risk: Clearing activity when both activeChatRunId is null and sessionRuns is empty could theoretically align poorly with a future gateway behavior that delays noteSessionRun for a successor run.
    • Mitigation: Condition is narrow; any concurrent run still present in sessionRuns blocks the orphan clear. Add regression tests if gateway ordering changes.

lyksdu added 2 commits March 23, 2026 16:52
Add a function to determine if orphaned activity status should be cleared when no active chat run exists.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 23, 2026

Greptile Summary

This PR fixes a TUI status line bug where the footer could remain stuck on streaming indefinitely after a chat run finishes, specifically when the final event arrives with wasActiveRun === false (the active-run pointer had already been cleared or reassigned) and no other run remains in flight. The fix adds a narrow shouldClearOrphanedActivityStatus() guard (!activeChatRunId && sessionRuns.size === 0) applied to both finalizeRun and terminateRun, so the activity status is cleared when no concurrent run is in flight.

  • The guard is evaluated after noteFinalizedRun / sessionRuns.delete removes the finalizing run, making it correctly narrow — a concurrent run still in sessionRuns blocks the clear.
  • The second new test ("does not clear activity on inactive final when another run is still in flight") correctly validates the concurrent-run guard.
  • The first new test ("clears orphaned streaming when final arrives inactive but no runs remain in flight") does not actually exercise the new shouldClearOrphanedActivityStatus() code path: because activeChatRunId is null when the final arrives, handleChatEvent re-assigns it to the finalizing run before wasActiveRun is captured, causing wasActiveRun = true and routing through the pre-existing branch. The test passes regardless of the new helper, giving false confidence that shouldClearOrphanedActivityStatus() is covered.

Confidence Score: 5/5

  • Safe to merge; the guard condition is narrow and the change is backward-compatible with no risk of prematurely clearing a running stream.
  • The production fix is logically correct and the condition is tight enough that regressions are very unlikely. The only gap is that the first added test doesn't actually cover shouldClearOrphanedActivityStatus(), but this is a test-quality issue, not a production risk. The second test correctly validates the concurrent-run guard.
  • No files require special attention; the test gap noted in src/tui/tui-event-handlers.test.ts is a non-blocking follow-up.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/tui/tui-event-handlers.test.ts
Line: 397-423

Comment:
**Test doesn't exercise the new code path**

This test sets `state.activeChatRunId = null` and then sends the final for the same `"run-orphan"` run. Inside `handleChatEvent`, the final event goes through `noteSessionRun("run-orphan")` first, and then:

```ts
if (!state.activeChatRunId && !isLocalBtwRunId?.(evt.runId)) {
  state.activeChatRunId = evt.runId;  // re-assigns to "run-orphan"
}
```

Because `activeChatRunId` is `null` and the run is not a btw run, it gets **re-registered as the active run** before `wasActiveRun` is captured. As a result, `wasActiveRun = true` and `setActivityStatus("idle")` is called via the **pre-existing `params.wasActiveRun` branch** — not via the new `shouldClearOrphanedActivityStatus()` path. The test would pass identically with or without the PR's changes.

To actually reach `shouldClearOrphanedActivityStatus()`, you'd need `activeChatRunId` to be non-null (pointing at a different run) so the re-assignment is skipped and `wasActiveRun` stays `false`, or the run must be a btw run (where the re-assignment is blocked by `!isLocalBtwRunId` being false). Consider adding a test case like:

```ts
// activeChatRunId pointing to a different, already-finalized run so re-assignment is skipped
state.activeChatRunId = "run-other"; // non-null, non-matching
// then finalize "run-other" first so sessionRuns is empty after run-orphan finalizes
handleChatEvent({ runId: "run-other", ..., state: "final", message: null });
// Now activeChatRunId is null (cleared by clearActiveRunIfMatch) and sessionRuns is empty
// Send streaming delta + final for run-orphan to validate shouldClearOrphanedActivityStatus()
```

This gap means `shouldClearOrphanedActivityStatus()` currently has no passing test that proves it's reachable and effective.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Add tests for orphaned streaming and act..." | Re-trigger Greptile

Comment thread src/tui/tui-event-handlers.test.ts
Update test to ensure activeChatRunId is null when final arrives inactive.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e380edcd2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/tui/tui-event-handlers.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3db3b912de

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/tui/tui-event-handlers.ts
lyksdu added 2 commits March 24, 2026 17:36
Removed pendingHistoryRefresh variable and related logic to streamline event handling.
Refactor setActivityStatus to update state directly and add regression test for handling late final events after errors.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a57b182d51

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/tui/tui-event-handlers.ts
Refactor event handler types and functions for better clarity and functionality.
lyksdu added 2 commits March 24, 2026 18:04
Added logic to manage pending history refresh based on active runs and session state.
@vincentkoc
Copy link
Copy Markdown
Member

ProjectClownfish could not safely update this branch, so it opened a narrow replacement PR instead.

Replacement PR: #72389
Source PR: #52745
Contributor credit is preserved in the replacement PR body and changelog plan.

@vincentkoc
Copy link
Copy Markdown
Member

ProjectClownfish could not safely update this branch, so it opened a narrow replacement PR instead.

Replacement PR: #72389
Source PR: #52745
Contributor credit is preserved in the replacement PR body and changelog plan.

1 similar comment
@vincentkoc
Copy link
Copy Markdown
Member

ProjectClownfish could not safely update this branch, so it opened a narrow replacement PR instead.

Replacement PR: #72389
Source PR: #52745
Contributor credit is preserved in the replacement PR body and changelog plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants