Skip to content

fix(desktop): resolve session-switch freeze on Windows 11 (#4337)#4358

Merged
SivanCola merged 2 commits into
esengine:main-v2from
ferstar:fix/desktop-4337-session-switch-freeze
Jun 14, 2026
Merged

fix(desktop): resolve session-switch freeze on Windows 11 (#4337)#4358
SivanCola merged 2 commits into
esengine:main-v2from
ferstar:fix/desktop-4337-session-switch-freeze

Conversation

@ferstar

@ferstar ferstar commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #4337 — switching chat sessions in v1.7.0 caused the entire window to freeze for many seconds on Windows 11.

Root cause

Three compounding factors, all introduced or worsened in v1.7.0:

  1. ListTabs used a write lock (Lock()) for a read-heavy operation — blocked the 2-second polling timer and user tab switches from proceeding concurrently.

  2. SetActiveTab held the write lock across disk I/OsaveTabsLocked() called os.MkdirAll + os.WriteFile + fileutil.ReplaceFile (which can retry transient Windows locks) all while holding a.mu.Lock().

  3. handleTabChange called ListTabs twice in sequence — once inside switchTabreconcileTabRuntime, and once more via refreshTabMetas.

Changes

# Change File Impact
1 ListTabs now uses an RLock read-only fast path and reacquires Lock only when stale tab order needs repair desktop/tabs.go Reduces polling/switch contention without writing shared state under a read lock
2 Split saveTabsLocked into collect-phase (under mu) + write-phase (outside mu); SetActiveTab unlocks before I/O desktop/tabs.go Disk I/O no longer blocks UI; added tabsSaveMu + version stamping to prevent stale snapshot races
3 reconcileTabRuntime returns TabMeta[], switchTab propagates it, handleTabChange reuses it for setTabMetas useController.ts, App.tsx One ListTabs call instead of two per tab switch
4 Added stale-order and stale-snapshot regression tests tabs_order_test.go Verifies concurrent ListTabs does not race and older snapshots do not overwrite newer saves

Verification

  • go test . -run 'Test(ListTabsKeepsExplicitOrderWhenActiveChanges|ListTabsRepairsStaleOrderWithoutRacing|SaveTabsSkipsOlderSnapshot)' -count=1 — clean
  • go test -race . -run TestListTabsRepairsStaleOrderWithoutRacing -count=1 — clean
  • go test -race . -run TestSaveTabsSkipsOlderSnapshot -count=1 — clean
  • go test . -count=1 -timeout=120s — clean
  • git diff --check — clean

)

Root cause: v1.7.0 introduced multiple compounding factors that froze the UI
for seconds when switching chat sessions on Windows 11.

Changes:
1. ListTabs: a.mu.Lock() → a.mu.RLock()
   Read-only operation was using the exclusive write lock, blocking all
   concurrent tab operations including the 2-second polling timer.

2. SetActiveTab: save tabs outside the write lock
   Split saveTabsLocked into saveTabsCollectLocked (gathers data under mu)
   + saveTabsWrite (disk I/O outside mu). Added tabsSaveMu + version
   stamping to prevent stale snapshots from overwriting newer ones.

3. handleTabChange: deduplicate ListTabs call
   reconcileTabRuntime already fetched TabMeta[] via ListTabs; return it
   through switchTab and reuse in handleTabChange instead of calling
   refreshTabMetas (another ListTabs).

4. ReplaceFile: reduce max retries from 8 to 3
   Worst-case backoff drops from ~720ms to ~120ms on Windows transient
   locks.

5. Added TestSaveTabsSkipsOlderSnapshot to verify version-stamping.
@github-actions github-actions Bot added desktop Wails desktop app (desktop/**) v2 Go rewrite (1.x) — main-v2 branch, active development labels Jun 14, 2026

@SivanCola SivanCola left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for chasing #4337. I found one blocker before this can merge.

  • ListTabs now takes RLock, but it still calls orderedTabIDsLocked, and that helper can assign to a.tabOrder when the saved/order state is stale or missing IDs. With the new read lock, two concurrent ListTabs callers can read/write tabOrder under shared readers. That is directly on the path this PR is trying to make concurrent: the polling refresh and tab-switch reconciliation can overlap.

    I reproduced this with a local race regression that creates a stale tabOrder and calls ListTabs concurrently; go test . -race -run TestListTabsDoesNotRaceWhenTabOrderNeedsRepair -count=1 reports a race at desktop/tabs.go:1380 / desktop/tabs.go:1395.

    Please either keep ListTabs under the exclusive lock, or split the ordering helper so the RLock path is purely read-only and any order repair happens under a.mu.Lock().

Non-blocking: the PR body says maxReplaceRetries was reduced to 3, but the actual diff leaves internal/fileutil/atomicwrite.go at 8. Please either update the description or include the intended change.

Verified while reviewing:

  • go test . -run Test(SaveTabsSkipsOlderSnapshot|ListTabsKeepsExplicitOrderWhenActiveChanges) -count=1 passed
  • go test -race . -run TestSaveTabsSkipsOlderSnapshot -count=1 passed
  • go test . -race -run TestListTabsKeepsExplicitOrderWhenActiveChanges -count=1 passed
  • CI checks are currently green

Keep the common ListTabs path under an RLock with a read-only ordering snapshot, then reacquire the write lock only when stale tab order needs repair. Add a race regression for concurrent ListTabs calls.

Co-authored-by: SivanCola <32437197+SivanCola@users.noreply.github.com>

@SivanCola SivanCola left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. I pushed f2198b3d to address the locking blocker I found earlier.

What changed:

  • ListTabs keeps the normal path under RLock, but now uses a read-only order snapshot there.
  • When the saved tab order is stale and needs repair, ListTabs reacquires a.mu.Lock() and repairs/rebuilds under the write lock.
  • Added TestListTabsRepairsStaleOrderWithoutRacing to cover concurrent ListTabs calls with stale tabOrder.
  • Updated the PR body to remove the stale maxReplaceRetries claim.

Verified:

  • go test . -run 'Test(ListTabsKeepsExplicitOrderWhenActiveChanges|ListTabsRepairsStaleOrderWithoutRacing|SaveTabsSkipsOlderSnapshot)' -count=1 passed
  • go test -race . -run TestListTabsRepairsStaleOrderWithoutRacing -count=1 passed
  • go test -race . -run TestSaveTabsSkipsOlderSnapshot -count=1 passed
  • go test . -count=1 -timeout=120s passed
  • git diff --check passed
  • GitHub CI is green, including race and test (windows-latest)

@SivanCola

Copy link
Copy Markdown
Collaborator

@codex review

@SivanCola SivanCola merged commit 9873ae7 into esengine:main-v2 Jun 14, 2026
14 checks passed
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

Reviewed commit: f2198b3d72

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

desktop Wails desktop app (desktop/**) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 最新1.7.0版切换会话十分卡顿, 在1.6.0版本不会

2 participants