Deflake io-threads test by anchoring active-time bound to a per-reactivation baseline#3764
Deflake io-threads test by anchoring active-time bound to a per-reactivation baseline#3764Taeknology wants to merge 5 commits into
Conversation
…ivation baseline The test "Force the use of IO threads and assert active IO thread usage" asserted that each io_thread's used_active_time_io_thread counter stayed below 1 second after a 1-second sleep. The counter is monotonic across the server lifetime and never reset (src/io_threads.c), so by the time of that assertion it already includes work from the earlier activate_io_threads_and_wait call plus the io-threads 1 -> 5 toggle. On test-sanitizer-address-gcc and test-almalinux8-tls-module CI lanes the cumulative total exceeds 1.0 s and the assertion fails (e.g. "Expected 1.619448 < (1000/1000)"). Capture per-thread used_active_time immediately before the sleep and bound the post-sleep delta against that snapshot. This preserves the original "IO threads should idle when not activated" invariant from valkey-io#2463 while anchoring it to this reactivation. Also flip /1000 to /1000.0 to avoid Tcl integer-division surprises. Fixes valkey-io#3727. Signed-off-by: Taeknology <20297177+Taeknology@users.noreply.github.com>
|
Local model verification — per-thread numbers from an instrumented run with 4 extra
Three observations:
Local arm64 cost per activation is ~0.003 s; the failing CI message (1.619448 across two activations) implies ~0.81 s per activation, i.e. ~270x slower than my local. That's why the absolute |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe test ChangesIO thread active time measurement fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/io-threads.tcl`:
- Around line 122-125: Remove the stale baseline assertion that compares
used_active_time against initial_active_times (the monotonic counter makes this
flaky); in the test block containing used_active_time, initial_active_times,
pre_sleep_active_times and sleep_time_ms, delete the assertion line referencing
initial_active_times and keep only the assertion that checks (used_active_time -
pre_sleep_active_times($i)) < (sleep_time_ms/1000.0), and update or simplify the
surrounding comment so it only documents bounding activity attributable to this
reactivation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 295c4bd3-df16-402e-bb9a-84d1a4ab079b
📒 Files selected for processing (1)
tests/unit/io-threads.tcl
The delta-from-initial-baseline assertion measures cumulative active time across the entire test (toggle + sleep + final activation). As the monotonic counter (src/io_threads.c:27) accumulates, this bound is fragile: PR valkey-io#3124 had to add valgrind:skip after the same form failed under valgrind. The remaining delta-from-pre-sleep assertion covers the real invariant — io_threads should idle during sleep and only the final reactivation contributes to the post-sleep delta. Addresses review feedback on valkey-io#3764. Signed-off-by: Taeknology <20297177+Taeknology@users.noreply.github.com>
|
Good point. Dropped the |
rainsupreme
left a comment
There was a problem hiding this comment.
Looks good! Fixes the measurement methodology.
nit: would like to see 100x github workflow run showing the test is no longer flaky (not a blocker for me though)
sarthakaggarwal97
left a comment
There was a problem hiding this comment.
This looks good to me. I would although like to see a 100x run of the flaky tests in that environment to be sure this solves it.
Please refer to this section on how to run daily tests multiple times: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#running-the-daily-workflow-on-demand-for-your-branch
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3764 +/- ##
============================================
+ Coverage 76.68% 76.78% +0.10%
============================================
Files 162 162
Lines 81021 81021
============================================
+ Hits 62129 62212 +83
+ Misses 18892 18809 -83 🚀 New features to boost your workflow:
|
|
I ran a targeted Daily workflow in my fork to stress-test this PR’s io-threads change with sanitizer jobs and repeated loops: https://github.com/sarthakaggarwal97/valkey/actions/runs/28386791964/job/84103406744 The failure is different from the original issue this PR was trying to fix. The original problem was that the test compared However, the targeted Daily run shows that the new delta-based assertion is still too strict under sanitizer stress runs. The failing assertion is: With So I think the PR is directionally fixing the right bug, but the replacement invariant still needs adjustment. |
Fixes #3727.
Why this fails
The test
"Force the use of IO threads and assert active IO thread usage"intests/unit/io-threads.tclchecked that eachused_active_time_io_thread_*stayed below 1 second after a 1-second sleep. The counter atsrc/io_threads.c:27is a_Atomic long longthat is monotonic across the server lifetime and never reset — line 96 of the test (assert_morethanafter theio-threads 1 -> 5toggle round-trip) already relies on exactly that property.By the time of the post-sleep assertion the counter includes:
activate_io_threads_and_wait(line 67) — 16 clients opened, client 0 silent, 15 senders × 32 INCRs = 480 ops.io-threads 1 -> 5toggle.activate_io_threads_and_wait(line 108) — another 480 ops.On
test-sanitizer-address-gccandtest-almalinux8-tls-modulethe cumulative total exceeds 1.0 s; that's the1.619448observed in the failure message.expr {$sleep_time_ms/1000}is also Tcl integer division (=1), so the bound is tighter than the comment suggests.Fix
Capture per-thread
used_active_timeimmediately beforeafter $sleep_time_msand bound the post-sleep delta against that snapshot. Preserves the global "IO threads should idle when not activated" invariant that line 116 was protecting (introduced in #2463), but anchors it to this reactivation. Also flips/1000to/1000.0to avoid the integer-division surprise.The new
r infosnapshot itself ticks workers by sub-millisecond becauseio-threads-always-active yesstays on for this test; the/1000.0conversion preserves enough precision for the< 1.0 stolerance to absorb it.Follow-up commit (6537310) also drops the now-redundant
delta_from_initialassertion: it measures cumulative active time across the whole test and is fragile by the same monotonic-counter mechanism — PR #3124 hit this exact form under valgrind (Expected (7.94 - 5.87) < (1000/1000)) and worked around it withvalgrind:skip. The per-reactivationdelta_from_pre_sleepbound is now the single source of truth for this flake class.Alternatives considered
sanitizer:skip(like Skip active io time test if valgrind is used #3124) — erodes CI coverage on two important lanes; Skip active io time test if valgrind is used #3124 removed a flake without a causal model, but here one is available.delta_from_initialassertion that is itself fragile.Verification
Locally on
darwin/arm64:./runtest --single unit/io-threads→Force the use of IO threads and assert active IO thread usage (6164 ms)passes under both vanilla andSANITIZER=addressbuilds, plus the existing memory-leak check.Verified the monotonic-accumulation model with an instrumented test run (4 extra activations before the pre-sleep snapshot,
puts stderrof cumulative / pre-sleep / post values); per-thread numbers in a follow-up comment below. I don't have an ASan/TLS-module Linux environment locally, so the absolute< 1.0 sthreshold crossing itself is left to the CI matrix.