Skip to content

IO-Threads redesign cleanup work#3544

Merged
madolson merged 11 commits into
valkey-io:unstablefrom
akashkgit:iothreads
May 12, 2026
Merged

IO-Threads redesign cleanup work#3544
madolson merged 11 commits into
valkey-io:unstablefrom
akashkgit:iothreads

Conversation

@akashkgit

@akashkgit akashkgit commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Follow-ups to the IO-threads redesign (#3324 / #3367). Bundles several independent fixes and refactors:

  • tryOffloadFreeObjToIOThreads: only offload the free of the sds buffer (which the I/O thread allocated). The robj itself is main-thread-owned and is now freed on the main thread.
  • trySendWriteToIOThreads: defer clearing last_header until after a successful enqueue, so the main thread can't mutate a header the I/O thread is concurrently reading.
  • evictClients: drop the pending_freed / close_asap bookkeeping and rely on freeClient's return value to bump stat_evictedclients. The old accounting double-counted protected clients and could under-evict.
  • Make MPSC/SPMC/SPSC queue sizes a runtime parameter on *Init(q, size) instead of compile-time *_QUEUE_SIZE macros in queues.h. The I/O-threads sizes (16384 / 4096 / 4096) now live as constants in io_threads.c; tests pass their own sizes.
  • updateIOThreads uses io_shared_outbox.queue_size instead of the macro.
  • Add function-level doc comments to queues.h.
  • Rename IOThreadFreeArgv/IOThreadPoll → ioThreadFreeArgv/ioThreadPoll for consistency.
  • Remove unused job_handler typedef; fix static_assert message (7 → 8).
    clang-format pass; Makefile tab→spaces nit.
  • Replace the getrusage(RUSAGE_THREAD) sys/user CPU sampling (and its #ifdef RUSAGE_THREAD fallback) with server.stat_active_time. One portable signal, one threshold (IO_IGNITION_MAIN_THREAD_ACTIVE_PERCENT = 30) replaces the dual sys>30% / sys>5%+user>50% rules.
  • Renames STATS_METRIC_MAIN_THREAD_CPU_{SYS,USER} → STATS_METRIC_MAIN_THREAD_ACTIVE_TIME.

Signed-off-by: akash kumar <akumdev@amazon.com>
Signed-off-by: akash kumar <akumdev@amazon.com>
Signed-off-by: akash kumar <akumdev@amazon.com>
@madolson

Copy link
Copy Markdown
Member

@akashkgit Can you mark it as a draft if it's not ready for review?

@madolson madolson moved this to In Progress in Valkey 9.1 Apr 27, 2026
@codecov

codecov Bot commented Apr 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.53731% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.70%. Comparing base (1d7224f) to head (1f88660).
⚠️ Report is 1 commits behind head on unstable.

Files with missing lines Patch % Lines
src/io_threads.c 77.27% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3544      +/-   ##
============================================
+ Coverage     76.66%   76.70%   +0.03%     
============================================
  Files           162      162              
  Lines         80644    80645       +1     
============================================
+ Hits          61823    61856      +33     
+ Misses        18821    18789      -32     
Files with missing lines Coverage Δ
src/networking.c 92.53% <100.00%> (+0.39%) ⬆️
src/queues.c 99.35% <100.00%> (+0.02%) ⬆️
src/server.h 100.00% <ø> (ø)
src/unit/test_queues.cpp 99.53% <100.00%> (ø)
src/io_threads.c 75.71% <77.27%> (-0.28%) ⬇️

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: akash kumar <akumdev@amazon.com>
akashkgit and others added 4 commits May 4, 2026 16:52
Signed-off-by: Akash Kumar <45854686+akashkgit@users.noreply.github.com>
Signed-off-by: akash kumar <akumdev@amazon.com>
Signed-off-by: akash kumar <akumdev@amazon.com>
Signed-off-by: akash kumar <akumdev@amazon.com>
Comment thread src/io_threads.c
#define IO_IGNITION_CPU_SYS 30.0
#define IO_IGNITION_CPU_SYS_LOW 5.0
#define IO_IGNITION_CPU_USER 50.0
#define IO_IGNITION_MAIN_THREAD_ACTIVE_PERCENT 30

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's still leave a comment for why this value is at 30%. We don't need to necessarily test this more comprehensively, but it would be good to at least acknowledge we haven't verified this is necessarily optimal.

@madolson madolson added run-benchmark run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) labels May 8, 2026
@madolson madolson linked an issue May 8, 2026 that may be closed by this pull request
akashkgit and others added 2 commits May 8, 2026 19:26
@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown

Benchmark ran on this commit: 05c90b5

RPS Benchmark Comparison: unstable vs 4b1aee5

No significant changes

No statistically significant changes detected across 16 test(s).

16 with no significant change

Click to expand full comparison tables

data_size = 16

% Change Test unstable 4b1aee5 unstable stats 4b1aee5 stats
+0±2% GET rps P1 T1 231K 231.9K n=3, σ=1.27K, CV=0.5%, CI95%=±1.4%, PI95%=±2.7% n=3, σ=688, CV=0.3%, CI95%=±0.7%, PI95%=±1.5%
+0±2% GET rps P1 T9 1.666M 1.67M n=3, σ=7.83K, CV=0.5%, CI95%=±1.2%, PI95%=±2.3% n=3, σ=12.6K, CV=0.8%, CI95%=±1.9%, PI95%=±3.7%
-1±7% GET rps P10 T1 1.25M 1.24M n=3, σ=17.5K, CV=1.4%, CI95%=±3.5%, PI95%=±6.9% n=3, σ=29.1K, CV=2.3%, CI95%=±5.8%, PI95%=±11.6%
+1±2% GET rps P10 T9 3.16M 3.18M n=3, σ=20.1K, CV=0.6%, CI95%=±1.6%, PI95%=±3.2% n=3, σ=13.4K, CV=0.4%, CI95%=±1.0%, PI95%=±2.1%
-0±1% SET rps P1 T1 222K 221.9K n=3, σ=1.01K, CV=0.5%, CI95%=±1.1%, PI95%=±2.3% n=3, σ=729, CV=0.3%, CI95%=±0.8%, PI95%=±1.6%
+0±3% SET rps P1 T9 1.58M 1.58M n=3, σ=15.2K, CV=1.0%, CI95%=±2.4%, PI95%=±4.8% n=3, σ=11.2K, CV=0.7%, CI95%=±1.8%, PI95%=±3.5%
-0±2% SET rps P10 T1 1.062M 1.061M n=3, σ=6.08K, CV=0.6%, CI95%=±1.4%, PI95%=±2.8% n=3, σ=2.97K, CV=0.3%, CI95%=±0.7%, PI95%=±1.4%
-1±1% SET rps P10 T9 2.21M 2.177M n=3, σ=12.3K, CV=0.6%, CI95%=±1.4%, PI95%=±2.8% n=3, σ=4.60K, CV=0.2%, CI95%=±0.5%, PI95%=±1.0%

data_size = 96

% Change Test unstable 4b1aee5 unstable stats 4b1aee5 stats
+1±3% GET rps P1 T1 227K 228K n=3, σ=2.85K, CV=1.3%, CI95%=±3.1%, PI95%=±6.2% n=3, σ=1.07K, CV=0.5%, CI95%=±1.2%, PI95%=±2.3%
-0.8±0.9% GET rps P1 T9 1.651M 1.638M n=3, σ=3.65K, CV=0.2%, CI95%=±0.5%, PI95%=±1.1% n=3, σ=4.66K, CV=0.3%, CI95%=±0.7%, PI95%=±1.4%
-2±3% GET rps P10 T1 1.24M 1.217M n=3, σ=15.7K, CV=1.3%, CI95%=±3.1%, PI95%=±6.3% n=3, σ=1.57K, CV=0.1%, CI95%=±0.3%, PI95%=±0.6%
-2±2% GET rps P10 T9 3.18M 3.12M n=3, σ=19.9K, CV=0.6%, CI95%=±1.6%, PI95%=±3.1% n=3, σ=18.8K, CV=0.6%, CI95%=±1.5%, PI95%=±3.0%
+1±2% SET rps P1 T1 219K 221K n=3, σ=1.90K, CV=0.9%, CI95%=±2.2%, PI95%=±4.3% n=3, σ=1.07K, CV=0.5%, CI95%=±1.2%, PI95%=±2.4%
-1±6% SET rps P1 T9 1.52M 1.50M n=3, σ=23.4K, CV=1.5%, CI95%=±3.8%, PI95%=±7.7% n=3, σ=24.9K, CV=1.7%, CI95%=±4.1%, PI95%=±8.2%
-0.2±0.5% SET rps P10 T1 1.052M 1.050M n=3, σ=1.60K, CV=0.2%, CI95%=±0.4%, PI95%=±0.8% n=3, σ=1.36K, CV=0.1%, CI95%=±0.3%, PI95%=±0.6%
-2±1% SET rps P10 T9 2.22M 2.183M n=3, σ=11.8K, CV=0.5%, CI95%=±1.3%, PI95%=±2.6% n=3, σ=3.54K, CV=0.2%, CI95%=±0.4%, PI95%=±0.8%

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30

Legend:

  • Test column: Command, metric, P=pipeline depth, T=io-threads
  • Significance: ✅ significant improvement, ❌ significant regression, ➖ not significant, ❔ insufficient data

Statistical Notes:

  • CV: Coefficient of Variation - relative variability (σ/μ × 100%)
  • CI95%: 95% Confidence Interval - range where the true population mean is likely to fall
  • PI95%: 95% Prediction Interval - range where a single future observation is likely to fall

Comment thread src/io_threads.c Outdated
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
@madolson madolson changed the title Post-merge fixes for IO-Threads Redesign IO-Threads redesign cleanup work May 12, 2026
@madolson madolson merged commit fdd9039 into valkey-io:unstable May 12, 2026
68 of 71 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to To be backported in Valkey 9.1 May 12, 2026
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request May 12, 2026
Follow-ups to the IO-threads redesign (valkey-io#3324 / valkey-io#3367). Bundles several
independent fixes and refactors:

- tryOffloadFreeObjToIOThreads: only offload the free of the sds buffer
(which the I/O thread allocated). The robj itself is main-thread-owned
and is now freed on the main thread.
- trySendWriteToIOThreads: defer clearing last_header until after a
successful enqueue, so the main thread can't mutate a header the I/O
thread is concurrently reading.
- evictClients: drop the pending_freed / close_asap bookkeeping and rely
on freeClient's return value to bump stat_evictedclients. The old
accounting double-counted protected clients and could under-evict.
- Make MPSC/SPMC/SPSC queue sizes a runtime parameter on *Init(q, size)
instead of compile-time *_QUEUE_SIZE macros in queues.h. The I/O-threads
sizes (16384 / 4096 / 4096) now live as constants in io_threads.c; tests
pass their own sizes.
- updateIOThreads uses io_shared_outbox.queue_size instead of the macro.
- Add function-level doc comments to queues.h.
- Rename IOThreadFreeArgv/IOThreadPoll → ioThreadFreeArgv/ioThreadPoll
for consistency.
- Remove unused job_handler typedef; fix static_assert message (7 → 8).
clang-format pass; Makefile tab→spaces nit.
- Replace the getrusage(RUSAGE_THREAD) sys/user CPU sampling (and its
#ifdef RUSAGE_THREAD fallback) with server.stat_active_time. One
portable signal, one threshold (IO_IGNITION_MAIN_THREAD_ACTIVE_PERCENT =
30) replaces the dual sys>30% / sys>5%+user>50% rules.
- Renames STATS_METRIC_MAIN_THREAD_CPU_{SYS,USER} →
STATS_METRIC_MAIN_THREAD_ACTIVE_TIME.

---------

Signed-off-by: akash kumar <akumdev@amazon.com>
Signed-off-by: Akash Kumar <45854686+akashkgit@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
lucasyonge pushed a commit that referenced this pull request May 12, 2026
Follow-ups to the IO-threads redesign (#3324 / #3367). Bundles several
independent fixes and refactors:

- tryOffloadFreeObjToIOThreads: only offload the free of the sds buffer
(which the I/O thread allocated). The robj itself is main-thread-owned
and is now freed on the main thread.
- trySendWriteToIOThreads: defer clearing last_header until after a
successful enqueue, so the main thread can't mutate a header the I/O
thread is concurrently reading.
- evictClients: drop the pending_freed / close_asap bookkeeping and rely
on freeClient's return value to bump stat_evictedclients. The old
accounting double-counted protected clients and could under-evict.
- Make MPSC/SPMC/SPSC queue sizes a runtime parameter on *Init(q, size)
instead of compile-time *_QUEUE_SIZE macros in queues.h. The I/O-threads
sizes (16384 / 4096 / 4096) now live as constants in io_threads.c; tests
pass their own sizes.
- updateIOThreads uses io_shared_outbox.queue_size instead of the macro.
- Add function-level doc comments to queues.h.
- Rename IOThreadFreeArgv/IOThreadPoll → ioThreadFreeArgv/ioThreadPoll
for consistency.
- Remove unused job_handler typedef; fix static_assert message (7 → 8).
clang-format pass; Makefile tab→spaces nit.
- Replace the getrusage(RUSAGE_THREAD) sys/user CPU sampling (and its
#ifdef RUSAGE_THREAD fallback) with server.stat_active_time. One
portable signal, one threshold (IO_IGNITION_MAIN_THREAD_ACTIVE_PERCENT =
30) replaces the dual sys>30% / sys>5%+user>50% rules.
- Renames STATS_METRIC_MAIN_THREAD_CPU_{SYS,USER} →
STATS_METRIC_MAIN_THREAD_ACTIVE_TIME.

---------

Signed-off-by: akash kumar <akumdev@amazon.com>
Signed-off-by: Akash Kumar <45854686+akashkgit@users.noreply.github.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
@sarthakaggarwal97 sarthakaggarwal97 moved this from To be backported to Done in Valkey 9.1 May 16, 2026
akashkgit added a commit to akashkgit/valkey that referenced this pull request May 18, 2026
This reverts commit fdd9039.

Signed-off-by: akash kumar <akumdev@amazon.com>
madolson pushed a commit that referenced this pull request May 18, 2026
This reverts the
[commit](fdd9039)
that was merged as part of the PR #3544 due to a performance regression
observed [here](#3750)

Signed-off-by: akash kumar <akumdev@amazon.com>
roshkhatri added a commit to roshkhatri/valkey that referenced this pull request Jun 8, 2026
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 9, 2026
This reverts the
[commit](fdd9039)
that was merged as part of the PR #3544 due to a performance regression
observed [here](#3750)

Signed-off-by: akash kumar <akumdev@amazon.com>
madolson pushed a commit that referenced this pull request Jun 9, 2026
This regression is still present in 9.1 GA as the cherrypick of revert
commit was missed during release.

Re-applies #3544 (reverted in #3756 due to ~20% SET regression) with the
performance fix from #3760.

**Root Cause:** The original #3544 changed
`tryOffloadFreeObjToIOThreads` to only offload the SDS buffer free to IO
threads, freeing the `robj` shell on the main thread. I carried out
profiling for the change and it showed that freeing the `robj` shell on
the main thread became the prime main-thread hotspot (~10% CPU), while
IO threads shifted from doing real `jemalloc` work to spinning idle on
`spmcDequeue`.

**Fix**: Keep `tryOffloadFreeObjToIOThreads` offloading the entire robj
(`decrRefCount`) to the IO thread. Cross-thread `zfree` is safe with
`jemalloc`.

This PR includes all cleanup work from #3544 so - 
- `trySendWriteToIOThreads`: defer clearing `last_header` until after
successful enqueue
- `evictClients`: simplified bookkeeping
- Queue sizes as runtime parameters instead of compile-time macros
- IO ignition policy using `stat_active_time` instead of `getrusage`
- Function renames (`IOThreadFreeArgv` --> `ioThreadFreeArgv`, etc.) and
doc comments

**Benchmark** on (Graviton4 c8gb.metal-48xl):
Config: SET, 128B values, 9 IO threads, pipeline=10, 1600 clients - Same
as Valkey official method
| Version | Throughput |
|---------|-----------|
| Unstable + original #3544 | ~1,554K rps |
| Unstable + this PR | ~2,116K rps |

<details>
<summary>Diff vs original #3544 (perf fix)</summary>

```diff
diff --git a/src/io_threads.c b/src/io_threads.c
--- a/src/io_threads.c
+++ b/src/io_threads.c
@@ // IO thread handler
 case JOB_REQ_FREE_OBJ:
-    zfree(data);
+    decrRefCount(data);
     break;

@@ // tryOffloadFreeObjToIOThreads
-    /* We offload only the free of the ptr that may be allocated by the I/O thread.
-     * The object itself was allocated by the main thread and will be freed by the main thread. */
-    void *job = tagJob(sdsAllocPtr(objectGetVal(obj)), JOB_REQ_FREE_OBJ);
+    void *job = tagJob(obj, JOB_REQ_FREE_OBJ);
     if (unlikely(spmcEnqueue(&io_shared_inbox, job) == false)) return C_ERR;
-    objectSetVal(obj, NULL);
-    decrRefCount(obj);
     io_jobs_submitted++;
```
</details>

---------

Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 10, 2026
This reverts the
[commit](fdd9039)
that was merged as part of the PR #3544 due to a performance regression
observed [here](#3750)

Signed-off-by: akash kumar <akumdev@amazon.com>
ranshid added a commit that referenced this pull request Jun 11, 2026
# Backport sweep for 9.1

Automated cherry-picks from PRs marked "To be backported".

## Applied

| Source PR | Title | Detail |
|---|---|---|
| #3743 | Fix buffered_reply assert in HFE commands with module keyspace
notifications | cherry-picked in a prior sweep |
| #3766 | Fix flaky block_keyspace_notification test for HGETDEL notify
race | cherry-picked in a prior sweep |
| #3800 | Fix heap-use-after-free in ACL LOAD when client free is
deferred | cherry-picked in a prior sweep |
| #3723 | Fix double-finish and RESP reply violation in cluster slot
migration | cherry-picked in a prior sweep |
| #3872 | Redacting customer information when hide_user_data_from_log is
true in rdb.c, networking.c, debug.c and t_hash | cherry-picked in a
prior sweep |
| #3846 | Fix use-after-free in VM_RegisterClusterMessageReceiver |
cherry-picked in a prior sweep |
| #3806 | Add ALL_DBS flag to CLUSTER FLUSHSLOT for database-level ACL |
cherry-picked in a prior sweep |
| #3847 | Harden SENTINEL commands and config rewrite against
control-character injection | |
| #3801 | Validate every DB clause in COPY against ACL db= permissions |
|
| #3851 | Replace AUTOMATION_PAT with Valkeyrie Bot GitHub App token | |
| #3848 | Fix cluster AUX-field control-character and delimiter
injection | |
| #3544 | Revert "IO-Threads redesign cleanup work (#3544)" |
cherry-picked in a prior sweep |
| #3888 | Report exact dbid for COPY in ACL LOG when db= access is
denied | conflicts resolved by Claude Code |
| #3942 | Fix shard_id format specifier in UPDATE message log |  |
| #3941 | Avoid random() % 0 undefined behaviour when
cluster-node-timeout < 30 | |

---
*Generated by valkey-ci-agent using Claude Code.*

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: chx9 <lovelypiska@outlook.com>
Signed-off-by: zackcam <zackcam@amazon.com>
Signed-off-by: Eran Ifrah <eifrah@amazon.com>
Signed-off-by: Jules Lasarte <lasartej@amazon.com>
Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Signed-off-by: akash kumar <akumdev@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: lovelypiska <lovelypiska@outlook.com>
Co-authored-by: zackcam <zackcam@amazon.com>
Co-authored-by: eifrah-aws <eifrah@amazon.com>
Co-authored-by: jjuleslasarte <jules.lasarte@gmail.com>
Co-authored-by: Jules Lasarte <lasartej@amazon.com>
Co-authored-by: Akash Kumar <45854686+akashkgit@users.noreply.github.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 12, 2026
This reverts the
[commit](fdd9039)
that was merged as part of the PR #3544 due to a performance regression
observed [here](#3750)

Signed-off-by: akash kumar <akumdev@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[NEW] Redesign IO threading communication model pending tasks

3 participants