Fix IO-Threads redesign cleanup perf regression from #3544#3938
Conversation
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR refactors the queue infrastructure from compile-time fixed capacities to runtime-configurable sizes, integrates dynamic queue initialization into IO threading, shifts IO thread ignition policy from CPU-time to main-thread active-time metrics, simplifies client eviction logic, and updates unit tests accordingly. ChangesQueue Infrastructure and IO Threading Refactor
Client Eviction Loop Simplification
Build System
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/unit/test_queues.cpp (1)
25-33: ⚡ Quick winAdd invalid-size coverage for the new init contract.
These fixtures only exercise valid power-of-two sizes. Since
*Initnow accepts runtime input, please add death/error-path coverage for0and a non-power-of-two value so the mask invariant doesn't regress silently.As per coding guidelines, "Data-structure and low-level logic changes should be covered by unit tests."
Also applies to: 143-153, 256-265
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/unit/test_queues.cpp` around lines 25 - 33, Add unit-test coverage for the new runtime init contract by exercising spscInit with invalid sizes: call spscInit(&q, 0) and spscInit(&q, nonPowerOfTwo) and assert they fail (death/assert/error) so the mask invariant can't regress; update the test fixtures around SetUp/SPSC_QUEUE_SIZE and add analogous invalid-size checks in the other test sections referenced (around lines 143-153 and 256-265) using the same pattern and test helpers so that ConsumerArg/spscInit behavior is validated for both zero and a non-power-of-two size.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/networking.c`:
- Around line 6535-6544: The loop is using only current
server.stat_clients_type_memory to decide when to stop evicting and only
increments server.stat_evictedclients when freeClient(c) returns true, which can
cause over-eviction and missed counts for async closes; change the eviction loop
(the loop that references server.stat_clients_type_memory, CLIENT_TYPE_NORMAL,
CLIENT_TYPE_PUBSUB, client_eviction_limit and iterates with
listNext(&bucket_iter)) to use a projected/freed-bytes target so the condition
uses current_memory minus an accumulated freed_estimate instead of live memory
alone, and after calling freeClient(c) always account for the client as evicted
(increment server.stat_evictedclients) and update the projected freed bytes by
the client's memory footprint (or by adjusting server.stat_clients_type_memory
appropriately) whether freeClient returned synchronous (1) or async (0) so the
loop stops at the real target and async-closed clients are counted.
In `@src/queues.c`:
- Around line 12-16: The initializer(s) allow queue_size == 0 because the
power-of-two check passes for zero; update mpscInit (and the other two queue
initializer functions) to explicitly reject zero capacity by changing the
assertion to require queue_size != 0 and a power-of-two (e.g. assert(queue_size
!= 0 && (queue_size & (queue_size - 1)) == 0)); this ensures q->queue_size is
nonzero before setting q->buffer and using q->queue_size - 1 for masking.
---
Nitpick comments:
In `@src/unit/test_queues.cpp`:
- Around line 25-33: Add unit-test coverage for the new runtime init contract by
exercising spscInit with invalid sizes: call spscInit(&q, 0) and spscInit(&q,
nonPowerOfTwo) and assert they fail (death/assert/error) so the mask invariant
can't regress; update the test fixtures around SetUp/SPSC_QUEUE_SIZE and add
analogous invalid-size checks in the other test sections referenced (around
lines 143-153 and 256-265) using the same pattern and test helpers so that
ConsumerArg/spscInit behavior is validated for both zero and a non-power-of-two
size.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 2428595b-cad3-43cd-bf15-c08bbf649833
📒 Files selected for processing (8)
src/Makefilesrc/io_threads.csrc/io_threads.hsrc/networking.csrc/queues.csrc/queues.hsrc/server.hsrc/unit/test_queues.cpp
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #3938 +/- ##
============================================
- Coverage 76.68% 76.60% -0.09%
============================================
Files 162 162
Lines 80731 80727 -4
============================================
- Hits 61910 61842 -68
- Misses 18821 18885 +64
🚀 New features to boost your workflow:
|
|
@coderabbitai Can you please review this for the perf regression fix for #3544 (reverted in #3756 due to ~20% SET regression) with the performance fix from #3760. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Fetch the PR diff
gh pr view 3938 --repo valkey-io/valkey --json files,body,title
# Get the diff
gh pr diff 3938 --repo valkey-io/valkey 2>/dev/null | head -3000Length of output: 32984 🏁 Script executed: #!/bin/bash
# Check io_private_inbox initialization for thread 0
rg -n "io_private_inbox" --type c
# Check tryOffloadFreeObjToIOThreads full context
rg -n "tryOffloadFreeObjToIOThreads" --type c -A 20 -B 5
# Check freeClient return value semantics
rg -n "int freeClient" --type c -A 5
# Check stat_active_time usage
rg -n "stat_active_time" --type c -C 3
# Check decrRefCount for thread safety concerns
rg -n "decrRefCount" src/io_threads.c -C 3
# Check if copyright header was intentionally removed from queues.h
git log --oneline -5 -- src/queues.hLength of output: 10427
Overall AssessmentThe core perf fix is correct and well-motivated. The A few things worth addressing: 1.
|
| Area | Status |
|---|---|
Core perf fix (decrRefCount in IO thread) |
✅ Correct |
trySendWriteToIOThreads header race fix |
✅ Correct |
| Queue runtime sizing | ✅ Clean |
IO ignition policy (active time vs. getrusage) |
✅ Simpler, cross-platform |
evictClients log ordering |
|
stat_evictedclients semantics change |
|
queues.h copyright header removed |
🐛 Looks unintentional |
_Static_assert message fix |
✅ |
The three items above are the ones I'd recommend addressing before merge. The perf fix and cleanups are solid work.
- Restore copyright header in queues.h - Reject queue_size == 0 in queue initializers (0 passes power-of-two check) Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
|
Benchmark ran on this commit: RPS Benchmark Comparison: unstable vs 4a035db1 significant change(s)
15 with no significant change Click to expand full comparison tablesdata_size = 128
data_size = 16
Configuration:
Legend:
Statistical Notes:
|
|
@akashkgit please review when you get a chance. |
The `test io-threads are runtime modifiable` test in `tests/unit/other.tcl` times out on the dedicated Valgrind jobs of the daily CI, failing the run. The failing test is introduced in #3938. This PR reduces the loop to 10 iterations under Valgrind. **Failure links:** - https://github.com/valkey-io/valkey/actions/runs/27386948127 (Jun 12) - https://github.com/valkey-io/valkey/actions/runs/27315974006 (Jun 11) - https://github.com/valkey-io/valkey/actions/runs/27245311034 (Jun 10) --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
The `test io-threads are runtime modifiable` test in `tests/unit/other.tcl` times out on the dedicated Valgrind jobs of the daily CI, failing the run. The failing test is introduced in #3938. This PR reduces the loop to 10 iterations under Valgrind. **Failure links:** - https://github.com/valkey-io/valkey/actions/runs/27386948127 (Jun 12) - https://github.com/valkey-io/valkey/actions/runs/27315974006 (Jun 11) - https://github.com/valkey-io/valkey/actions/runs/27245311034 (Jun 10) --------- Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
This regression is still present in 9.1 GA as the cherrypick of revert commit was missed during release.
Re-applies #3544 (reverted in #3756 due to ~20% SET regression) with the performance fix from #3760.
Root Cause: The original #3544 changed
tryOffloadFreeObjToIOThreadsto only offload the SDS buffer free to IO threads, freeing therobjshell on the main thread. I carried out profiling for the change and it showed that freeing therobjshell on the main thread became the prime main-thread hotspot (~10% CPU), while IO threads shifted from doing realjemallocwork to spinning idle onspmcDequeue.Fix: Keep
tryOffloadFreeObjToIOThreadsoffloading the entire robj (decrRefCount) to the IO thread. Cross-threadzfreeis safe withjemalloc.This PR includes all cleanup work from #3544 so -
trySendWriteToIOThreads: defer clearinglast_headeruntil after successful enqueueevictClients: simplified bookkeepingstat_active_timeinstead ofgetrusageIOThreadFreeArgv-->ioThreadFreeArgv, etc.) and doc commentsBenchmark on (Graviton4 c8gb.metal-48xl):
Config: SET, 128B values, 9 IO threads, pipeline=10, 1600 clients - Same as Valkey official method
Diff vs original #3544 (perf fix)