Skip to content

fix(desktop): close stuck thinking window after stream interruption (#3746)#3893

Closed
ashishexee wants to merge 1 commit into
esengine:main-v2from
ashishexee:fix/3746-thinking-window-stuck
Closed

fix(desktop): close stuck thinking window after stream interruption (#3746)#3893
ashishexee wants to merge 1 commit into
esengine:main-v2from
ashishexee:fix/3746-thinking-window-stuck

Conversation

@ashishexee

@ashishexee ashishexee commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Closes #3746

Problem

When the model service is interrupted during the thinking/reasoning phase, the "thinking window" (ProcessCard) gets stuck open and cannot be closed by clicking or pressing ESC. The UI shows a perpetual "running" spinner.

Solution

5 changes across 5 files (+113, -1):

Fix 1: Stale-stream watchdog (useController.ts)

Top-level useEffect watchdog that detects when the frontend has running + live set but no token events (text/reasoning) have arrived for 30 seconds. Uses a lastTokenAt ref (reset on turn_started/text/reasoning) and explicit deps [activeTabId, reconcileTabRuntime, activeState.running, activeState.live]. Calculates elapsed time and schedules a timeout for the remaining delay, with a double-check guard inside the callback. When triggered, calls reconcileTabRuntime() which queries the backend and dispatches backend_status: {running: false} if the backend is no longer running.

Addresses review feedback:

  • Hoisted from inside the event-subscription useEffect (was a nested hook → Rules of Hooks violation / crash) to a top-level sibling effect.
  • Replaced the no-deps "reset on every render" trick with a lastTokenAt ref + explicit deps, so the timer only re-arms when running/live/activeTabId change, and correctly measures time since the last token event.
  • Added turn_started to the lastTokenAt reset events to prevent spurious reconcile on fresh turns where lastTokenAt.current was still 0 or stale from a previous turn.

Fix 2: Panic recovery (controller.go)

Adds defer recover() to the runGuarded goroutine. On panic: sets c.running = false, emits TurnDone with error. On normal path: recover() returns nil, guard is a no-op — no double TurnDone.

Fix 3: Controlled open on reasoning card (Message.tsx)

Replaces defaultOpen={item.streaming} (uncontrolled, one-time useState initializer that ignores prop changes) with controlled mode: useState(item.streaming) seeds reasoningOpen, and open={reasoningOpen} + onOpenChange={setReasoningOpen} gives full control. No useEffect auto-close — the card stays open after normal completion (original UX: user can review reasoning), and the stuck-spinner is resolved by the watchdog transitioning the status icon from "running" to "done".

Addresses review feedback:

  • Completed turns from disk: useState(false) → card starts collapsed (matches original defaultOpen={false}).
  • Streaming turns: useState(true) → card starts open, stays open after completion.
  • Watchdog stuck case: when streaming → false, the status icon changes from spinner to checkmark, so the card no longer appears "stuck" even though it remains open.

Fix 4: Scoped ESC handler (ProcessCard.tsx)

Replaces the global document.addEventListener("keydown") (which closed ALL open cards on a single ESC and collided with ESC elsewhere) with a scoped onKeyDown on the <button> element. ESC only closes the card whose header button currently has focus. Removes useEffect import entirely.

Fix 5: Panic recovery tests (controller_test.go)

Two new tests:

  • TestRunGuardedPanicEmitsTurnDone — verifies panic recovery emits TurnDone with error and sets c.running = false
  • TestRunGuardedPanicDoesNotDoubleEmitTurnDone — verifies no double TurnDone emission on the panic path

Testing

go vet ./internal/control/...  — clean
go test ./internal/control/... — 22/22 pass (including 2 new panic recovery tests)
git diff --check               — clean

@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development desktop Wails desktop app (desktop/**) agent Core agent loop (internal/agent, internal/control) labels Jun 10, 2026
@ashishexee ashishexee force-pushed the fix/3746-thinking-window-stuck branch from a180ca5 to f802d42 Compare June 10, 2026 19:45

@esengine esengine left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful write-up — the root-cause analysis on the lost turn_done is exactly right, and the Go-side panic recovery in runGuarded (plus the two tests covering emit-once-on-panic) is solid work. That part I'd take as-is.

The frontend changes need another pass before this can land, though — one of them is a hard crash:

1. The watchdog useEffect is nested inside another effect (blocker). In useController.ts the new useEffect is placed inside the body of the existing event-subscription useEffect (the one that opens with const textBatch = ... and closes with return () => { textBatch.drain(); ... }, }, [dispatchTo, ...])). Calling a hook from inside another hook's callback violates the Rules of Hooks — React throws Invalid hook call the moment that outer effect commits on mount, which takes the whole main view down. It needs to be hoisted to the top level of useController, as a sibling effect. (This is also why the symptom wouldn't show up in go test — it only fails when the desktop app actually renders.)

2. The no-deps "reset on every render" trick is fragile. Even once hoisted, an effect with no dependency array re-arms a 30s timer on every render for any reason, not just token events — so an unrelated re-render keeps resetting it and it may never fire, while a genuinely-idle-but-not-re-rendering tab fires it. Prefer tracking a lastTokenAt ref updated on token events and comparing against it, with an explicit deps array so react-hooks/exhaustive-deps stays happy.

3. open={!userClosedReasoning} regresses completed cards. With the initial state false, every reasoning card renders expanded — including finished turns and history loaded from disk, which previously collapsed via defaultOpen={item.streaming}. Derive the open state from streaming as well, e.g. keep it controlled but seed/close it when item.streaming goes false, so done turns stay collapsed.

4. Global ESC handler toggles every open card. The document-level keydown listener in ProcessCard is registered per instance, so with the reasoning card and a tool card both open, one ESC closes all of them at once — and it collides with ESC used elsewhere (cancel/close). Scope it to the focused/hovered card, or handle ESC at a single owner instead of per-card.

The diagnosis is good and the backend half is ready — fix the hook nesting first (that's the crash), then the open-state and ESC scoping, and I'll re-review.

@ashishexee

Copy link
Copy Markdown
Contributor Author

Thanks for the careful write-up — the root-cause analysis on the lost turn_done is exactly right, and the Go-side panic recovery in runGuarded (plus the two tests covering emit-once-on-panic) is solid work. That part I'd take as-is.

The frontend changes need another pass before this can land, though — one of them is a hard crash:

1. The watchdog useEffect is nested inside another effect (blocker). In useController.ts the new useEffect is placed inside the body of the existing event-subscription useEffect (the one that opens with const textBatch = ... and closes with return () => { textBatch.drain(); ... }, }, [dispatchTo, ...])). Calling a hook from inside another hook's callback violates the Rules of Hooks — React throws Invalid hook call the moment that outer effect commits on mount, which takes the whole main view down. It needs to be hoisted to the top level of useController, as a sibling effect. (This is also why the symptom wouldn't show up in go test — it only fails when the desktop app actually renders.)

2. The no-deps "reset on every render" trick is fragile. Even once hoisted, an effect with no dependency array re-arms a 30s timer on every render for any reason, not just token events — so an unrelated re-render keeps resetting it and it may never fire, while a genuinely-idle-but-not-re-rendering tab fires it. Prefer tracking a lastTokenAt ref updated on token events and comparing against it, with an explicit deps array so react-hooks/exhaustive-deps stays happy.

3. open={!userClosedReasoning} regresses completed cards. With the initial state false, every reasoning card renders expanded — including finished turns and history loaded from disk, which previously collapsed via defaultOpen={item.streaming}. Derive the open state from streaming as well, e.g. keep it controlled but seed/close it when item.streaming goes false, so done turns stay collapsed.

4. Global ESC handler toggles every open card. The document-level keydown listener in ProcessCard is registered per instance, so with the reasoning card and a tool card both open, one ESC closes all of them at once — and it collides with ESC used elsewhere (cancel/close). Scope it to the focused/hovered card, or handle ESC at a single owner instead of per-card.

The diagnosis is good and the backend half is ready — fix the hook nesting first (that's the crash), then the open-state and ESC scoping, and I'll re-review.

Thanks for the careful review and detailed feedback. You're absolutely right about the hook nesting issue — that's a blocker and I missed it during implementation. I'll rework the watchdog logic by hoisting it to the top level and will also address the concerns around timer re-arming, reasoning card open state, and ESC handling before updating the PR.

Appreciate the thorough review. I'll push a revised version shortly.

@ashishexee ashishexee force-pushed the fix/3746-thinking-window-stuck branch from f802d42 to e236823 Compare June 11, 2026 02:45
@ashishexee ashishexee force-pushed the fix/3746-thinking-window-stuck branch from e236823 to d153d18 Compare June 11, 2026 02:47
@ashishexee ashishexee requested a review from esengine June 11, 2026 03:14

@esengine esengine left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the latest push — all four points addressed properly:

  1. The watchdog is now a top-level sibling effect with a real dependency array; the lastTokenAt ref + remaining-time timer is exactly the shape I asked for, and re-arming off activeState.live means token flushes naturally reset it while a stalled stream still fires after the full 30s.
  2. Covered by the same rework.
  3. useState(item.streaming) as the controlled seed keeps history/finished cards collapsed on mount while letting the user close a stuck one — which is the original #3746 symptom.
  4. ESC now lives on the header button's own onKeyDown, so it only acts on the focused card and can't collide with global ESC semantics.

The Go-side panic recovery in runGuarded plus the emit-once tests were already solid in the first round. Nice turnaround — merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) desktop Wails desktop app (desktop/**) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 无法关闭思考窗口

2 participants