Fix flaky actor-startup tests with deterministic readiness barriers (#1409, #1378)#1410
Merged
Aaronontheweb merged 3 commits intoJun 15, 2026
Conversation
…er (netclaw-dev#1409) Three tests in BackgroundJobManagerActorTests were polling side effects of startup reconciliation with fixed wall-clock budgets (AwaitAssertAsync(5s), ExpectMsgAsync(10s)). Under parallel CI load the shared ThreadPool is saturated, so the Reconcile mailbox message can sit unscheduled past the budget. Add GetBackgroundJobManagerHealth / BackgroundJobManagerHealthResponse to the job protocol (mirrors GetReminderHealthQuery pattern from reminders). Wire Receive<GetBackgroundJobManagerHealth> in BackgroundJobManagerActor replying immediately with active/queued counts. Because PreStart does Self.Tell(Reconcile.Instance), the mailbox order is always Reconcile -> HealthAsk. A successful health reply proves reconciliation ran to completion before any assertion fires. Updated tests: - StartupReconciliation_DeliversLostNotificationToOwningSession: 30s health Ask barrier before ExpectMsgAsync - StartupReconciliation_MarksOrphanedJobsAsLost: 30s health Ask barrier, removed AwaitAssertAsync poll, direct assertions - StartupReconciliation_EmitsAlert_ForLegacyJobMissingTrustFields: 30s health Ask barrier, removed AwaitAssertAsync poll, direct assertion
…dev#1378) DailyStatsActorTests (closes netclaw-dev#1378): raise both Ask timeouts from 3s to 15s. The query handler opens a SqliteConnection synchronously on the actor mailbox thread. On Windows CI, cold-file open (Defender scan + NTFS fsync + connection pool eviction from parallel Dispose calls) combined with ThreadPool hill-climb delay from concurrent TestKit ActorSystems can exhaust the 3s budget. Fifteen seconds covers the worst-case cold-open path without affecting pass-latency. SessionMemoryObserverActorTests: make CreateObserverWithParentProbeAsync truly async. The previous implementation blocked the calling thread with .Result on an Ask<IActorRef>. Changed to async Task return type and passed the test CancellationToken through. Updated all seven call sites to await the helper.
…rivate ctor - Replace DateTimeOffset.UtcNow with TimeProvider.System.GetUtcNow() in two test fixture definitions (StartupReconciliation_DeliversLostNotification and StartupReconciliation_MarksOrphanedJobsAsLost). CLAUDE.md requires TimeProvider throughout so time can be virtualized in tests. - Move NetclawPaths creation and EnsureDirectoriesExist() call before the File.WriteAllText in StartupReconciliation_EmitsAlert_ForLegacyJobMissingTrustFields. The previous order silently depended on ConfigureAkka having already created the jobs/ directory; now the directory is explicitly created before use. - Add private constructor to GetBackgroundJobManagerHealth to match the singleton enforcement pattern used by GetActiveEntityIds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the class of flaky tests described in #1409 where actor startup side effects (reconciliation, schema alerts, lost-job notifications) were awaited on fixed wall-clock budgets instead of deterministic signals. Also closes #1378 (DailyStatsActorTests timeout on Windows CI).
BackgroundJobManagerActor: addGetBackgroundJobManagerHealthquery + handler. BecausePreStartdoesSelf.Tell(Reconcile.Instance), the mailbox order is always[Reconcile → HealthAsk], so a successful health reply proves reconciliation ran to completion before any assertion fires.BackgroundJobManagerActorTests: replaceAwaitAssertAsync(5s)/ExpectMsgAsync(10s)polls with a 30s health-Ask barrier then direct assertions.DailyStatsActorTests(Flaky: DailyStatsActorTests AskTimeoutException on Windows CI (cold SQLite open on actor mailbox) #1378): raise both Ask timeouts 3s → 15s. The query handler opensSqliteConnectionsynchronously on the mailbox thread; on Windows CI, cold-file open (Defender scan + NTFS fsync + pool drain) combined with ThreadPool hill-climb can exceed 3s.SessionMemoryObserverActorTests: makeCreateObserverWithParentProbeAsynctruly async (was blocking with.Result), threadCancellationTokenthrough.Code review fixes (3rd commit)
DateTimeOffset.UtcNowwithTimeProvider.System.GetUtcNow()in two test fixture definitions (CLAUDE.md rule)NetclawPaths+EnsureDirectoriesExist()beforeFile.WriteAllTextin the legacy-alert test — removes implicit dependency onConfigureAkkahaving already created thejobs/directoryprivateconstructor toGetBackgroundJobManagerHealthto match the singleton enforcement pattern used byGetActiveEntityIdsTest plan
BackgroundJobManagerActorTestspass (startup-reconciliation tests no longer flake under parallel CI load)DailyStatsActorTestspasses on Windows CI within the 15s windowSessionMemoryObserverActorTestspasses (async refactor is behaviorally identical)dotnet slopwatch analyze— no new violationsAdd-FileHeaders.ps1 -Verify— all headers presentCloses #1409
Closes #1378