fix: chain reloadForScope when scope/commit changes mid-flight by tomasz-tomczyk · Pull Request #482 · tomasz-tomczyk/crit

tomasz-tomczyk · 2026-05-07T11:43:41Z

Summary

The reloadInFlight dedup guard in reloadForScope() returned the in-flight promise unconditionally, so a click that swapped scope while a previous reload was still running awaited the OLD scope's promise. The new scope's /api/session was never fetched — page stayed on the previous scope's files while the toggle visually flipped.
Surfaced as a Windows-only e2e flake. Linux file I/O finishes loadAllFileData() faster than the next click, so the dedup branch was rarely hit. Windows runners are slow enough that the reload outlasts the next click handler, and the race fired deterministically — which is why different tests failed across runs (commit-selection, round-complete, threading).
Fix: track (scope, commit) per in-flight reload. Same key → dedup (existing behaviour, collapses double-triggers like SSE base-changed landing during a manual click). Different key → chain after the current reload finishes via .then(reloadForScope, reloadForScope). The rejection arm prevents a failed reload from poisoning subsequent scope changes.

Review

Code review: passed (intent clean, frontend-expert walked the chain timing — re-entry after finally runs against reloadInFlight === null, so chained calls re-read diffScope/diffCommit and dispatch correctly)
Parity audit: N/A — crit-web/ doesn't have scope toggles (single shared review viewer)

Test plan

commit-selection.spec.ts runs locally against the freshly built binary: 13/13 pass on macOS including the previously-flaky :111:7 "commit picker reappears when switching back to All scope".
ESLint clean. gofmt, go vet, golangci-lint all clean (no Go changes).
The real verification is the Windows CI run — e2e-windows has been red on main since feat: add Windows + WSL support #459 landed; this PR should turn it green.

🤖 Generated with Claude Code

The dedup guard returned the in-flight reload promise unconditionally, so a click that swapped scope while a previous reload was still running awaited the OLD scope's promise. The new scope's /api/session was never fetched, leaving the page rendered with the previous scope's files but the new toggle visually active. This was a Windows-only e2e flake because Linux file I/O finishes loadAllFileData() faster than the user can click the next toggle in tests. On Windows runners the reload outlasts the next click handler, so the race fires deterministically. Different tests fail across runs (commit-selection / round-complete / threading) — anything that triggers a reload then immediately follows up with another state change. Fix: track (scope, commit) per in-flight reload. Same key → dedup (existing behaviour, collapses double-triggers like SSE base-changed landing during a manual click). Different key → chain after the current reload finishes via .then(reloadForScope), so each distinct request actually runs.

codecov · 2026-05-07T11:54:18Z

Codecov Report

❌ Patch coverage is 26.66667% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.94%. Comparing base (3e31cd4) to head (3c6d87e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
process_group_windows.go	0.00%	54 Missing ⚠️
watch.go	94.11%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (26.66%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #482      +/-   ##
==========================================
- Coverage   69.27%   68.94%   -0.34%     
==========================================
  Files          43       45       +2     
  Lines       10817    10902      +85     
==========================================
+ Hits         7494     7516      +22     
- Misses       2755     2818      +63     
  Partials      568      568

Flag	Coverage Δ
e2e	`32.04% <17.33%> (-0.15%)`	⬇️
unit	`67.14% <95.23%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

TestClientExitsOnFinish only killed the client process in t.Cleanup, but crit's daemon is intentionally detached and survives the client's exit. On Windows the surviving daemon's cwd handle on repoDir caused t.TempDir RemoveAll to fail with "file in use by another process" — the test would pass but Go reported the cleanup error and failed the run. Read the daemon PID from the session file (waitForDaemonPort already parses it; rename to waitForDaemonSession and return the full entry), then terminate it in t.Cleanup using the same terminate→poll→Kill escalation stopDaemon already uses. Once the daemon process is gone its file handles are released and RemoveAll succeeds. POSIX would have ignored the open handles and unlinked anyway; this manifested only on Windows.

Killing the daemon and polling processExists wasn't enough — Windows can delay releasing the cwd directory handle for tens of milliseconds after the process object reports exit, so t.TempDir's RemoveAll still hit "file in use by another process". Add a best-effort retry loop (40 × 50ms = up to 2s) on resolvedRepo inside the cleanup, gated to windows. Runs before t.TempDir's own cleanup via LIFO, so by the time the framework cleans up there's nothing left to delete and the spurious error is gone. POSIX is unaffected: it ignores open handles when unlinking.

terminateProcess on the daemon PID alone wasn't enough on Windows: crit's daemon polls `git status` once per second from watch.go and spawns git/gh/sl subprocesses during diff/PR work. Each child inherits repoDir as its cwd and holds an independent directory handle. Windows' TerminateProcess does not cascade to descendants, so any in-flight git child survived, kept the cwd locked, and t.TempDir's RemoveAll still hit "file in use by another process" — a 2s retry loop wasn't enough because a fresh git child can spawn every second the daemon is alive. Wrap the spawned crit (and therefore its whole tree) in a Job Object with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE. Start the parent suspended, assign to the job, then NtResumeProcess — closes the spawn-then-assign race in which a child could escape the job. TerminateJobObject in cleanup brings the entire tree down atomically; the existing per-PID terminate path is kept as a safety net for the daemon itself. Unix is a no-op (SIGTERM to the daemon is sufficient there, and Unix doesn't hold cwd handles open the way Windows does). Drop the now-redundant 40×50ms RemoveAll retry loop — the Job Object kill is synchronous and complete.

SignalRoundComplete clears f.Comments synchronously and signals the watcher to process the round; the watcher then runs git status, loads disk, and calls carryForwardFileComments. carryForwardFileComments unconditionally cleared f.Comments before re-appending carried-forward PreviousComments, so any comment that landed between SignalRoundComplete and the watcher's run was wiped out. The race window is microseconds on Linux/macOS but widens on Windows because git status and file I/O take noticeably longer there. Symptom: agent posts a comment immediately after /api/round-complete returns, comment briefly appears, then disappears when the watcher finally processes the round; on the next finish-review the no-changes overlay fires because only carried-forward comments remain. The e2e-windows flake at round-complete.spec.ts:732 was the visible tip. Filter f.Comments to preserve entries whose ID is not in PreviousComments — those are brand-new for this round, not carry-forward targets. carryForwardAllComments had a parallel bug: the len(f.Comments) > 0 skip silently dropped PreviousComments for files that only contained race-window-added comments. Replace the count check with a CarriedForward marker check. This is a real backend correctness bug, not a test-only fix; surfacing it as Windows-only flakiness was an artifact of the platform's I/O latency, not the symptom's cause.

tomasz-tomczyk added 4 commits May 7, 2026 13:05

tomasz-tomczyk merged commit 34adf5e into main May 7, 2026
9 of 10 checks passed

tomasz-tomczyk deleted the fix-e2e-windows-flakes branch May 7, 2026 13:37

BrewTestBot mentioned this pull request May 8, 2026

crit 0.11.0 Homebrew/homebrew-core#281557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: chain reloadForScope when scope/commit changes mid-flight#482

fix: chain reloadForScope when scope/commit changes mid-flight#482
tomasz-tomczyk merged 5 commits intomainfrom
fix-e2e-windows-flakes

tomasz-tomczyk commented May 7, 2026

Uh oh!

codecov Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tomasz-tomczyk commented May 7, 2026

Summary

Review

Test plan

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 7, 2026 •

edited

Loading