fix(desktop): resolve session-switch freeze on Windows 11 (#4337)#4358
Conversation
) Root cause: v1.7.0 introduced multiple compounding factors that froze the UI for seconds when switching chat sessions on Windows 11. Changes: 1. ListTabs: a.mu.Lock() → a.mu.RLock() Read-only operation was using the exclusive write lock, blocking all concurrent tab operations including the 2-second polling timer. 2. SetActiveTab: save tabs outside the write lock Split saveTabsLocked into saveTabsCollectLocked (gathers data under mu) + saveTabsWrite (disk I/O outside mu). Added tabsSaveMu + version stamping to prevent stale snapshots from overwriting newer ones. 3. handleTabChange: deduplicate ListTabs call reconcileTabRuntime already fetched TabMeta[] via ListTabs; return it through switchTab and reuse in handleTabChange instead of calling refreshTabMetas (another ListTabs). 4. ReplaceFile: reduce max retries from 8 to 3 Worst-case backoff drops from ~720ms to ~120ms on Windows transient locks. 5. Added TestSaveTabsSkipsOlderSnapshot to verify version-stamping.
SivanCola
left a comment
There was a problem hiding this comment.
Thanks for chasing #4337. I found one blocker before this can merge.
-
ListTabsnow takesRLock, but it still callsorderedTabIDsLocked, and that helper can assign toa.tabOrderwhen the saved/order state is stale or missing IDs. With the new read lock, two concurrentListTabscallers can read/writetabOrderunder shared readers. That is directly on the path this PR is trying to make concurrent: the polling refresh and tab-switch reconciliation can overlap.I reproduced this with a local race regression that creates a stale
tabOrderand callsListTabsconcurrently;go test . -race -run TestListTabsDoesNotRaceWhenTabOrderNeedsRepair -count=1reports a race atdesktop/tabs.go:1380/desktop/tabs.go:1395.Please either keep
ListTabsunder the exclusive lock, or split the ordering helper so theRLockpath is purely read-only and any order repair happens undera.mu.Lock().
Non-blocking: the PR body says maxReplaceRetries was reduced to 3, but the actual diff leaves internal/fileutil/atomicwrite.go at 8. Please either update the description or include the intended change.
Verified while reviewing:
go test . -run Test(SaveTabsSkipsOlderSnapshot|ListTabsKeepsExplicitOrderWhenActiveChanges) -count=1passedgo test -race . -run TestSaveTabsSkipsOlderSnapshot -count=1passedgo test . -race -run TestListTabsKeepsExplicitOrderWhenActiveChanges -count=1passed- CI checks are currently green
Keep the common ListTabs path under an RLock with a read-only ordering snapshot, then reacquire the write lock only when stale tab order needs repair. Add a race regression for concurrent ListTabs calls. Co-authored-by: SivanCola <32437197+SivanCola@users.noreply.github.com>
SivanCola
left a comment
There was a problem hiding this comment.
Approved. I pushed f2198b3d to address the locking blocker I found earlier.
What changed:
ListTabskeeps the normal path underRLock, but now uses a read-only order snapshot there.- When the saved tab order is stale and needs repair,
ListTabsreacquiresa.mu.Lock()and repairs/rebuilds under the write lock. - Added
TestListTabsRepairsStaleOrderWithoutRacingto cover concurrentListTabscalls with staletabOrder. - Updated the PR body to remove the stale
maxReplaceRetriesclaim.
Verified:
go test . -run 'Test(ListTabsKeepsExplicitOrderWhenActiveChanges|ListTabsRepairsStaleOrderWithoutRacing|SaveTabsSkipsOlderSnapshot)' -count=1passedgo test -race . -run TestListTabsRepairsStaleOrderWithoutRacing -count=1passedgo test -race . -run TestSaveTabsSkipsOlderSnapshot -count=1passedgo test . -count=1 -timeout=120spassedgit diff --checkpassed- GitHub CI is green, including
raceandtest (windows-latest)
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Fixes #4337 — switching chat sessions in v1.7.0 caused the entire window to freeze for many seconds on Windows 11.
Root cause
Three compounding factors, all introduced or worsened in v1.7.0:
ListTabsused a write lock (Lock()) for a read-heavy operation — blocked the 2-second polling timer and user tab switches from proceeding concurrently.SetActiveTabheld the write lock across disk I/O —saveTabsLocked()calledos.MkdirAll+os.WriteFile+fileutil.ReplaceFile(which can retry transient Windows locks) all while holdinga.mu.Lock().handleTabChangecalledListTabstwice in sequence — once insideswitchTab→reconcileTabRuntime, and once more viarefreshTabMetas.Changes
ListTabsnow uses anRLockread-only fast path and reacquiresLockonly when stale tab order needs repairdesktop/tabs.gosaveTabsLockedinto collect-phase (undermu) + write-phase (outsidemu);SetActiveTabunlocks before I/Odesktop/tabs.gotabsSaveMu+ version stamping to prevent stale snapshot racesreconcileTabRuntimereturnsTabMeta[],switchTabpropagates it,handleTabChangereuses it forsetTabMetasuseController.ts,App.tsxListTabscall instead of two per tab switchtabs_order_test.goListTabsdoes not race and older snapshots do not overwrite newer savesVerification
go test . -run 'Test(ListTabsKeepsExplicitOrderWhenActiveChanges|ListTabsRepairsStaleOrderWithoutRacing|SaveTabsSkipsOlderSnapshot)' -count=1— cleango test -race . -run TestListTabsRepairsStaleOrderWithoutRacing -count=1— cleango test -race . -run TestSaveTabsSkipsOlderSnapshot -count=1— cleango test . -count=1 -timeout=120s— cleangit diff --check— clean