fix(test): root-cause the recurring TempDir teardown CI flake#3403
Merged
Conversation
Windows/Linux CI intermittently reddened on t.TempDir auto-cleanup (RemoveAll: "being used by another process" on Windows, "directory not empty" on Linux) even though every assertion passed — across the boot legacy-migration tests and the desktop capabilities tests. #3371 blamed t.Chdir and fixed one test, but that same test kept failing afterward, so the real cause is a teardown race, not the cwd. Two independent fixes: 1. jobs.Manager.Close only cancelled the session context and returned without waiting for job goroutines to observe the cancel and exit, so a caller tearing down a t.TempDir could race a goroutine still holding a file under it. Close now waits on a WaitGroup — shutdown is graceful. 2. Add a robustTempDir test helper (boot + desktop) that retries RemoveAll for a short window, absorbing the brief post-Close interval where a background resource still holds a file under the dir. A dir that never frees is logged, not fatal, so a genuine leak stays visible.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Windows/Linux CI intermittently reddens on
t.TempDirauto-cleanup even though every assertion in the test passed:TempDir RemoveAll cleanup: ... The process cannot access the file because it is being used by another process.TempDir RemoveAll cleanup: ... directory not emptyIt hits different tests each run —
internal/bootlegacy-migration tests anddesktopcapabilities tests — which is why reruns kept landing on it (the recent[1/6]..[6/6]series got blocked on it repeatedly).#3371attributed this tot.Chdirand switched a single test toWorkspaceRoot, but that same test (TestBuildMigratesLegacySessionsFromConfigSessionDir) kept failing afterward — so the real cause is a teardown race, not the process cwd.Root cause
The failing tests all build a full
Controller. At teardown a background resource can still hold a file under the temp dir for a few milliseconds afterClose()returns, andt.TempDir's automaticRemoveAllraces it.Fixes (two, independent)
jobs.Manager.Closeis now graceful. It previously onlycancel()-ed the session context and returned without waiting for job goroutines to observe the cancel and exit. Added async.WaitGroup:StartdoesAdd(1)/defer Done(),Closedoescancel()thenWait(). Jobs observe the cancel through their run context (exec.CommandContextkills a bash job's process), so the wait is bounded.robustTempDirtest helper (boot + desktop) that retriesRemoveAllfor a short window, absorbing the brief post-Closeinterval where a background resource still holds a file. A dir that never frees ist.Logf-ed, not fatal, so a genuine leak stays visible without reintroducing the flake.Verification
gofmtclean;go vetclean.internal/jobs,internal/control,internal/bootpass; the two previously-flaky boot tests pass-count=200locally.Follow-up
Unblocks #3367 and #3370 (the last two of the
[1/6]..[6/6]series), which were red purely on this flake. Once this lands, those two get rebased onto it and should go green normally.