Skip to content

test(data): de-flake DeviceLinkRepositoryImplTest by running on the wall clock#5883

Merged
jamesarich merged 1 commit into
mainfrom
claude/determined-leavitt-8a9b1c
Jun 20, 2026
Merged

test(data): de-flake DeviceLinkRepositoryImplTest by running on the wall clock#5883
jamesarich merged 1 commit into
mainfrom
claude/determined-leavitt-8a9b1c

Conversation

@jamesarich

Copy link
Copy Markdown
Collaborator

Why

DeviceLinkRepositoryImplTest.reconcilePrunesShortCodesNoLongerInCatalog fails intermittently in CI. It is a test-harness flake, not a product bug — the repository logic is correct; the test mixed runTest's virtual clock with Room's real dispatcher.

Root cause

reconcile() guards its network fetch with withTimeoutOrNull(5s), whose deadline is measured against the calling coroutine's clock. The test ran on runTest + UnconfinedTestDispatcher (virtual time), while Room executes its DB work on the real ioDispatcher — which runTest neither owns nor awaits. Each time the coroutine parked on a real IO thread, the test scheduler went idle and runTest fast-forwarded virtual time; under load it jumped past the 5s budget, so the fetch was reported as timed out, store() was skipped, and the cache kept the pre-prune rows [a, b] instead of [a].

This is purely a test artifact: in production the timeout runs on the wall clock against a real network call, so it never misfires.

The previously-added multiConnection = false for in-memory DBs was a separate — and on Room 3.0.0-rc01, moot — mitigation: Room already forces a single connection for in-memory databases (name == null) regardless of the pool config, so it was never what made or broke this test.

Changes

🐛 Bug Fixes

  • Run DeviceLinkRepositoryImplTest on real dispatchers + runBlocking instead of UnconfinedTestDispatcher + runTest, so reconcile()'s withTimeoutOrNull is measured on the wall clock (matching production) and runTest's virtual-clock fast-forward can no longer spuriously expire it.

🧹 Chores

  • Correct the configureCommon KDoc: clarify that Room 3 always serves in-memory databases from a single connection regardless of the pool config, and that this test's determinism comes from the wall clock, not from multiConnection.

Testing Performed

  • The flake only surfaces under concurrent IO-pool contention (like CI), so a temporary 24-thread stress harness was used to reproduce it. On main it failed ~0.3% of runs, and the failure count exactly matched the network refresh timed out log count — confirming the timeout-misfire mechanism. With this change, ~18,000 concurrent + ~15,000 sequential runs completed with zero failures, timedOut=0. The harness was removed before this PR.
  • Full baseline: ./gradlew spotlessApply spotlessCheck detekt assembleDebug test allTestsBUILD SUCCESSFUL; all 6 DeviceLinkRepositoryImplTest methods pass.

…all clock

reconcilePrunesShortCodesNoLongerInCatalog flaked in CI. reconcile()
guards its network fetch with withTimeoutOrNull, whose 5s deadline is
measured against the calling coroutine's clock. The test drove that
clock with runTest/UnconfinedTestDispatcher (virtual time) while Room
executes its DB work on the real ioDispatcher, which runTest neither
owns nor awaits. Whenever the coroutine parked on a real IO thread the
test scheduler idled and runTest fast-forwarded virtual time; under
load it jumped past the 5s budget, so the fetch was reported as timed
out, store() was skipped, and the cache kept the pre-prune rows
[a, b] instead of [a].

A 24-thread stress harness reproduced this at ~0.3%/run, and the
failure count exactly matched the "network refresh timed out" log
count. Switching the test to real dispatchers + runBlocking, so the
timeout is measured on the wall clock as in production, drove ~18k
concurrent runs to zero failures.

Also corrects the configureCommon KDoc: Room already forces a single
connection for in-memory databases regardless of the pool config, so
multiConnection=false is not what makes this test deterministic.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added the testing Test additions or modifications label Jun 20, 2026
@jamesarich jamesarich enabled auto-merge June 20, 2026 16:50
@jamesarich jamesarich added this pull request to the merge queue Jun 20, 2026
Merged via the queue into main with commit b314a0c Jun 20, 2026
19 checks passed
@jamesarich jamesarich deleted the claude/determined-leavitt-8a9b1c branch June 20, 2026 16:58
jamesarich added a commit to jeremiah-k/Meshtastic-Android that referenced this pull request Jun 20, 2026
Brings the branch up to date with main and resolves the lone conflict in
DeviceLinkRepositoryImplTest.kt by taking main's wall-clock harness
(runBlocking + Dispatchers.Unconfined, from meshtastic#5883). The branch's
runTest(UnconfinedTestDispatcher) variant would reintroduce the Room
ioDispatcher virtual-clock flake meshtastic#5883 fixed; the test set is otherwise
identical, so no coverage is lost. The refresh-timeout / NonCancellable
production changes merge cleanly and are unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Test additions or modifications

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant