fix(desktop): retry atomic writes past transient Windows locks#4193
Merged
Conversation
Four users crashed the same day on `rename ...desktop-topic-title-sources.json.tmp: Access is denied` — antivirus / the search indexer / a second instance briefly locks the destination JSON, MoveFileEx fails, and the error reached the frontend as an uncaught [unhandledrejection] that paints the crash overlay. The desktop's cosmetic-state writers (tabs, projects, topic titles/sources/created-ats, telemetry) did a raw os.Rename(tmp, path) with no retry, bypassing fileutil.ReplaceFile. Harden ReplaceFile to retry with a short backoff while the tmp source survives — a missing tmp means the write itself failed, so no retry can help — keep the EXDEV copy fallback, and route the six sites through it. Branch-meta / session / config writers already on ReplaceFile get the same robustness for free.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
A recurring crash group, four different Windows users in one day (surfaced by the new dashboard summary column):
Cause
Antivirus, the Windows Search indexer, or a second instance holds the destination JSON for a few hundred milliseconds.
MoveFileEx(Go'sos.Rename) then fails withERROR_ACCESS_DENIED, and because the desktop's cosmetic-state writers did a rawos.Rename(tmp, path)— bypassingfileutil.ReplaceFileand with no retry — the error propagated up a bound method and reached the frontend as an uncaught promise rejection, painting the full crash overlay over a cosmetic title-cache write.Fix
fileutil.ReplaceFile: retry the replace a few times with a short escalating backoff while the tmp source still exists — a missing tmp means the write itself failed, so it fails fast with no pointless retry. The EXDEV copy fallback is unchanged. Every existing caller (branch metadata, session titles, config, dotenv, acp) gains the same Windows robustness.desktop/tabs.go: route the six rawos.Rename(tmp, path)writers (tabs, projects, topic titles / sources / created-ats, telemetry snapshot) throughReplaceFile.Verification
go test ./internal/fileutil— new tests cover fast-fail on missing tmp and retry-then-return-error when the destination can never be replaced (tmp preserved for the next attempt).go test ./desktop -run 'Topic|Tab|Telemetry|Title'andgo vet ./desktop— green.A truly permanent FS error (e.g. a read-only
.reasonix) still surfaces; that's a broken environment, not the transient lock these reports show. Catching the rejection frontend-side so even that degrades gracefully is a reasonable follow-up.