Skip to content

fix: chain reloadForScope when scope/commit changes mid-flight#482

Merged
tomasz-tomczyk merged 5 commits intomainfrom
fix-e2e-windows-flakes
May 7, 2026
Merged

fix: chain reloadForScope when scope/commit changes mid-flight#482
tomasz-tomczyk merged 5 commits intomainfrom
fix-e2e-windows-flakes

Conversation

@tomasz-tomczyk
Copy link
Copy Markdown
Owner

Summary

  • The reloadInFlight dedup guard in reloadForScope() returned the in-flight promise unconditionally, so a click that swapped scope while a previous reload was still running awaited the OLD scope's promise. The new scope's /api/session was never fetched — page stayed on the previous scope's files while the toggle visually flipped.
  • Surfaced as a Windows-only e2e flake. Linux file I/O finishes loadAllFileData() faster than the next click, so the dedup branch was rarely hit. Windows runners are slow enough that the reload outlasts the next click handler, and the race fired deterministically — which is why different tests failed across runs (commit-selection, round-complete, threading).
  • Fix: track (scope, commit) per in-flight reload. Same key → dedup (existing behaviour, collapses double-triggers like SSE base-changed landing during a manual click). Different key → chain after the current reload finishes via .then(reloadForScope, reloadForScope). The rejection arm prevents a failed reload from poisoning subsequent scope changes.

Review

  • Code review: passed (intent clean, frontend-expert walked the chain timing — re-entry after finally runs against reloadInFlight === null, so chained calls re-read diffScope/diffCommit and dispatch correctly)
  • Parity audit: N/A — crit-web/ doesn't have scope toggles (single shared review viewer)

Test plan

  • commit-selection.spec.ts runs locally against the freshly built binary: 13/13 pass on macOS including the previously-flaky :111:7 "commit picker reappears when switching back to All scope".
  • ESLint clean. gofmt, go vet, golangci-lint all clean (no Go changes).
  • The real verification is the Windows CI run — e2e-windows has been red on main since feat: add Windows + WSL support #459 landed; this PR should turn it green.

🤖 Generated with Claude Code

The dedup guard returned the in-flight reload promise unconditionally,
so a click that swapped scope while a previous reload was still running
awaited the OLD scope's promise. The new scope's /api/session was never
fetched, leaving the page rendered with the previous scope's files but
the new toggle visually active.

This was a Windows-only e2e flake because Linux file I/O finishes
loadAllFileData() faster than the user can click the next toggle in
tests. On Windows runners the reload outlasts the next click handler,
so the race fires deterministically. Different tests fail across runs
(commit-selection / round-complete / threading) — anything that triggers
a reload then immediately follows up with another state change.

Fix: track (scope, commit) per in-flight reload. Same key → dedup
(existing behaviour, collapses double-triggers like SSE base-changed
landing during a manual click). Different key → chain after the current
reload finishes via .then(reloadForScope), so each distinct request
actually runs.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 26.66667% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.94%. Comparing base (3e31cd4) to head (3c6d87e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
process_group_windows.go 0.00% 54 Missing ⚠️
watch.go 94.11% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (26.66%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #482      +/-   ##
==========================================
- Coverage   69.27%   68.94%   -0.34%     
==========================================
  Files          43       45       +2     
  Lines       10817    10902      +85     
==========================================
+ Hits         7494     7516      +22     
- Misses       2755     2818      +63     
  Partials      568      568              
Flag Coverage Δ
e2e 32.04% <17.33%> (-0.15%) ⬇️
unit 67.14% <95.23%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

TestClientExitsOnFinish only killed the client process in t.Cleanup, but
crit's daemon is intentionally detached and survives the client's exit.
On Windows the surviving daemon's cwd handle on repoDir caused t.TempDir
RemoveAll to fail with "file in use by another process" — the test would
pass but Go reported the cleanup error and failed the run.

Read the daemon PID from the session file (waitForDaemonPort already
parses it; rename to waitForDaemonSession and return the full entry),
then terminate it in t.Cleanup using the same terminate→poll→Kill
escalation stopDaemon already uses. Once the daemon process is gone its
file handles are released and RemoveAll succeeds.

POSIX would have ignored the open handles and unlinked anyway; this
manifested only on Windows.
Killing the daemon and polling processExists wasn't enough — Windows can
delay releasing the cwd directory handle for tens of milliseconds after
the process object reports exit, so t.TempDir's RemoveAll still hit
"file in use by another process".

Add a best-effort retry loop (40 × 50ms = up to 2s) on resolvedRepo
inside the cleanup, gated to windows. Runs before t.TempDir's own
cleanup via LIFO, so by the time the framework cleans up there's nothing
left to delete and the spurious error is gone.

POSIX is unaffected: it ignores open handles when unlinking.
terminateProcess on the daemon PID alone wasn't enough on Windows:
crit's daemon polls `git status` once per second from watch.go and
spawns git/gh/sl subprocesses during diff/PR work. Each child inherits
repoDir as its cwd and holds an independent directory handle. Windows'
TerminateProcess does not cascade to descendants, so any in-flight git
child survived, kept the cwd locked, and t.TempDir's RemoveAll still
hit "file in use by another process" — a 2s retry loop wasn't enough
because a fresh git child can spawn every second the daemon is alive.

Wrap the spawned crit (and therefore its whole tree) in a Job Object
with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE. Start the parent suspended,
assign to the job, then NtResumeProcess — closes the spawn-then-assign
race in which a child could escape the job. TerminateJobObject in
cleanup brings the entire tree down atomically; the existing per-PID
terminate path is kept as a safety net for the daemon itself.

Unix is a no-op (SIGTERM to the daemon is sufficient there, and Unix
doesn't hold cwd handles open the way Windows does).

Drop the now-redundant 40×50ms RemoveAll retry loop — the Job Object
kill is synchronous and complete.
SignalRoundComplete clears f.Comments synchronously and signals the
watcher to process the round; the watcher then runs git status, loads
disk, and calls carryForwardFileComments. carryForwardFileComments
unconditionally cleared f.Comments before re-appending carried-forward
PreviousComments, so any comment that landed between SignalRoundComplete
and the watcher's run was wiped out.

The race window is microseconds on Linux/macOS but widens on Windows
because git status and file I/O take noticeably longer there.
Symptom: agent posts a comment immediately after /api/round-complete
returns, comment briefly appears, then disappears when the watcher
finally processes the round; on the next finish-review the no-changes
overlay fires because only carried-forward comments remain. The
e2e-windows flake at round-complete.spec.ts:732 was the visible tip.

Filter f.Comments to preserve entries whose ID is not in
PreviousComments — those are brand-new for this round, not carry-forward
targets. carryForwardAllComments had a parallel bug: the
len(f.Comments) > 0 skip silently dropped PreviousComments for files
that only contained race-window-added comments. Replace the count check
with a CarriedForward marker check.

This is a real backend correctness bug, not a test-only fix; surfacing
it as Windows-only flakiness was an artifact of the platform's I/O
latency, not the symptom's cause.
@tomasz-tomczyk tomasz-tomczyk merged commit 34adf5e into main May 7, 2026
9 of 10 checks passed
@tomasz-tomczyk tomasz-tomczyk deleted the fix-e2e-windows-flakes branch May 7, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant