Skip to content

Add design doc for I/O threads#4022

Merged
hpatro merged 2 commits into
valkey-io:unstablefrom
hpatro:io-threads-design-doc
Jun 24, 2026
Merged

Add design doc for I/O threads#4022
hpatro merged 2 commits into
valkey-io:unstablefrom
hpatro:io-threads-design-doc

Conversation

@hpatro

@hpatro hpatro commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Document the shared I/O threading infrastructure: SPMC inbox, MPSC outbox, per-worker SPSC inboxes, job lifecycle, drain semantics, live reconfiguration.

Will follow up with cluster io threads offload with #3438

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f1928f89-9229-49b7-8cfe-ea389da17010

📥 Commits

Reviewing files that changed from the base of the PR and between a765d89 and 54953dd.

📒 Files selected for processing (1)
  • design-docs/io-threads.md
✅ Files skipped from review due to trivial changes (1)
  • design-docs/io-threads.md

📝 Walkthrough

Walkthrough

A new design document design-docs/io-threads.md is added, covering Valkey's shared I/O threading infrastructure: threading model, queue topology (SPMC inbox, per-worker SPSC inboxes, MPSC outbox), tagged-pointer encoding, job kinds and lifecycle counters, outbox backpressure and drain barrier semantics, live CONFIG SET io-threads reconfiguration, and a source-file reference section.

Changes

I/O Threads Design Document

Layer / File(s) Summary
Overview and threading model
design-docs/io-threads.md
Introduces document purpose and scope; specifies the main-thread-owns-state rule, worker thread identification and pinning/parking mechanics, and the drain-before-reconfigure invariant for live updates.
Queue topology, tagged pointers, and job kinds
design-docs/io-threads.md
Defines SPMC shared inbox, per-worker SPSC private inboxes, and MPSC outbox; documents worker priority and batching/commit behavior; specifies tagged-pointer representation with alignment constraints; lists job request/result kinds and steps for integrating a new consumer with dispatch, worker, and completion handlers.
Job lifecycle, counters, and backpressure handling
design-docs/io-threads.md
Describes end-to-end job flow from main-thread dispatch through worker execution to response processing; defines submission/finish counters and pending-response statistics; explains worker-local outbox buffering and retry flushing under backpressure; details the synchronous drain barrier and per-client waitForClientIO() spin primitive.
Live reconfiguration and implementation mapping
design-docs/io-threads.md
Documents full CONFIG SET io-threads reconfiguration flow including overflow eligibility guard, drain-then-park-then-spawn steps, dynamic scaling with io-threads-always-active override, and a source-file reference mapping mechanisms to implementation entry points.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add design doc for I/O threads' directly and accurately summarizes the main change—adding a design document for I/O threading infrastructure.
Description check ✅ Passed The description is relevant to the changeset, specifying what infrastructure components are documented (SPMC inbox, MPSC outbox, per-worker SPSC inboxes, job lifecycle, drain semantics, live reconfiguration) and mentioning a follow-up effort.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 markdownlint-cli2 (0.22.1)
design-docs/io-threads.md

markdownlint-cli2 v0.22.1 (markdownlint v0.40.0)
Error: Unable to use configuration file '/coderabbit-0.markdownlint-cli2.jsonc'; ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at throwForConfigurationFile (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:48:9)
at readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:169:5)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
[cause]: Error: ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at async open (node:internal/fs/promises:640:25)
at async Object.readFile (node:internal/fs/promises:1287:14)
at async readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:141:17)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: '/coderabbit-0.markdownlint-cli2.jsonc'
}
}


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@hpatro

hpatro commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

@akashkgit would be great if you could take a look.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
design-docs/io-threads.md (2)

49-53: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Clarify the purpose and design rationale of per-worker private inboxes.

The document states that per-worker SPSC inboxes are used for "jobs that must run on a specific worker (e.g. argv frees pinned by cur_tid, poll jobs at high thread counts)." However, it doesn't explain why these jobs must run on a specific worker (thread affinity, per-worker state, cache locality, etc.) or how the main thread decides which worker to target. This information would help implementers understand when to use the shared vs. private inbox.

Consider adding a brief sentence on the design rationale.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design-docs/io-threads.md` around lines 49 - 53, The table entry for
io_private_inbox[i] lacks sufficient explanation of its design rationale and
usage criteria. Expand the documentation to clarify why certain jobs must run on
a specific worker (such as thread affinity for argv frees pinned by cur_tid,
per-worker state management, or cache locality), how the main thread decides
which worker to target, and provide concrete guidance on when implementers
should route jobs to the private inbox versus the shared inbox for better
decision-making.

79-83: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Clarify eligibility validation and per-consumer dispatch requirements.

The integration checklist mentions that a new consumer must add "a try*ToIOThreads() dispatch helper on the main thread" that "validate[s] eligibility" (line 89), but the document doesn't explain what eligibility checks should be performed or what happens if eligibility fails. This could leave implementers uncertain about error handling and preconditions.

Consider expanding this section with a few examples of eligibility checks (e.g., client state, load thresholds, feature flags).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design-docs/io-threads.md` around lines 79 - 83, The document mentions that
the `try*ToIOThreads()` dispatch helper on the main thread must validate
eligibility, but it does not explain what eligibility checks should be performed
or what behavior occurs when eligibility validation fails. Expand the
eligibility validation section (around line 89 where this is first mentioned) by
adding clarification and concrete examples of eligibility checks that new
consumers should consider, such as validating client state, checking load
thresholds, or verifying feature flags. Also explain the error handling or
fallback behavior when eligibility validation fails, so implementers understand
the expected preconditions and failure modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@design-docs/io-threads.md`:
- Around line 137-141: The functions VM_CreateCommand() and
moduleUnregisterCommands() both call drainIOThreadsQueue() without the necessary
outbox guard that exists in updateIOThreads(). Add the same safety
check—verifying that getPendingIOResponsesCount() is not greater than
io_shared_outbox.queue_size—before invoking drainIOThreadsQueue() in both
VM_CreateCommand() and moduleUnregisterCommands(). If the condition is violated,
either defer the operation or implement an appropriate error handling strategy
to prevent potential worker stalls.

---

Nitpick comments:
In `@design-docs/io-threads.md`:
- Around line 49-53: The table entry for io_private_inbox[i] lacks sufficient
explanation of its design rationale and usage criteria. Expand the documentation
to clarify why certain jobs must run on a specific worker (such as thread
affinity for argv frees pinned by cur_tid, per-worker state management, or cache
locality), how the main thread decides which worker to target, and provide
concrete guidance on when implementers should route jobs to the private inbox
versus the shared inbox for better decision-making.
- Around line 79-83: The document mentions that the `try*ToIOThreads()` dispatch
helper on the main thread must validate eligibility, but it does not explain
what eligibility checks should be performed or what behavior occurs when
eligibility validation fails. Expand the eligibility validation section (around
line 89 where this is first mentioned) by adding clarification and concrete
examples of eligibility checks that new consumers should consider, such as
validating client state, checking load thresholds, or verifying feature flags.
Also explain the error handling or fallback behavior when eligibility validation
fails, so implementers understand the expected preconditions and failure modes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a441b96-efa6-475f-bd8e-5403828af0df

📥 Commits

Reviewing files that changed from the base of the PR and between bee30d4 and 779b845.

📒 Files selected for processing (1)
  • design-docs/io-threads.md

Comment thread design-docs/io-threads.md
Comment on lines +137 to +141
> **Caveat:** `drainIOThreadsQueue()` does not itself drain `io_shared_outbox`.
> Callers must ensure the outbox cannot fill while draining, otherwise workers
> can stall on `flushPendingIOResponses` and the spin will not progress. The
> current guard is in `updateIOThreads()`, which refuses the resize if
> `getPendingIOResponsesCount() > io_shared_outbox.queue_size`.

@coderabbitai coderabbitai Bot Jun 23, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Find all callers of drainIOThreadsQueue() and verify they have outbox guards

# Search for calls to drainIOThreadsQueue
rg -n 'drainIOThreadsQueue\(' src/ --type=c -B 10 -A 5

# For each call site, check if getPendingIOResponsesCount() is checked first
rg -n 'getPendingIOResponsesCount\|drainIOThreadsQueue' src/io_threads.c -B 5 -A 10

Repository: valkey-io/valkey

Length of output: 5216


🏁 Script executed:

# Get detailed context around the two module.c calls to drainIOThreadsQueue()
echo "=== Context around line 1420 (module.c) ==="
sed -n '1400,1440p' src/module.c

echo -e "\n=== Context around line 13251 (module.c) ==="
sed -n '13230,13270p' src/module.c

# Also check for any guards or conditions in the functions calling drainIOThreadsQueue in module.c
echo -e "\n=== Check function signatures and preceding checks ==="
sed -n '1390,1425p' src/module.c | head -40
sed -n '13240,13275p' src/module.c | head -40

Repository: valkey-io/valkey

Length of output: 7334


🏁 Script executed:

# Search for callers of VM_CreateCommand
echo "=== Callers of VM_CreateCommand ==="
rg -n 'VM_CreateCommand' src/ --type=c -B 3 -A 2

# Search for callers of moduleUnregisterCommands
echo -e "\n=== Callers of moduleUnregisterCommands ==="
rg -n 'moduleUnregisterCommands' src/ --type=c -B 3 -A 2

# Check module loading/unloading flow
echo -e "\n=== Module lifecycle functions ==="
rg -n 'loadModule|unloadModule|module.*load' src/ --type=c | head -20

Repository: valkey-io/valkey

Length of output: 5557


🏁 Script executed:

# Find moduleLoad and moduleUnload callers to understand when they're called
echo "=== Callers of moduleLoad ==="
rg -n 'moduleLoad\(' src/ --type=c -B 3 -A 2 | head -40

echo -e "\n=== Callers of moduleUnload or moduleUnloadAllModules ==="
rg -n 'moduleUnload|moduleUnloadAllModules' src/ --type=c -B 2 -A 2 | head -50

echo -e "\n=== Check if modules can be loaded via commands ==="
rg -n 'MODULE.*LOAD|loadModule|unloadModule' src/server.c | head -20

Repository: valkey-io/valkey

Length of output: 5131


🏁 Script executed:

# Look for MODULE LOAD/UNLOAD command implementation
echo "=== MODULE LOAD/UNLOAD command handlers ==="
sed -n '14550,14600p' src/module.c

# Check if there's any CLIENT PAUSE or similar synchronization
echo -e "\n=== Check for pausing or locking around module operations ==="
rg -n 'pauseClients|pauseall|pause' src/module.c | head -20

# Look at moduleUnloadInternal to see what checks it does
echo -e "\n=== moduleUnloadInternal implementation ==="
sed -n '13535,13625p' src/module.c

Repository: valkey-io/valkey

Length of output: 6254


Module LOAD/UNLOAD commands lack the outbox guard before drainIOThreadsQueue(), risking worker stalls at runtime.

The caveat in io-threads.md correctly warns that callers must guard against outbox overflow. However, VM_CreateCommand() (called via MODULE LOAD at line 1420) and moduleUnregisterCommands() (called via MODULE UNLOAD at line 13251) invoke drainIOThreadsQueue() without checking getPendingIOResponsesCount() first—unlike updateIOThreads() which has this guard. Since MODULE LOAD and UNLOAD are runtime commands that execute while I/O may be pending, these code paths can deadlock if the outbox fills during drain.

Either add the getPendingIOResponsesCount() > io_shared_outbox.queue_size check to both module functions, or pause I/O before module load/unload operations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design-docs/io-threads.md` around lines 137 - 141, The functions
VM_CreateCommand() and moduleUnregisterCommands() both call
drainIOThreadsQueue() without the necessary outbox guard that exists in
updateIOThreads(). Add the same safety check—verifying that
getPendingIOResponsesCount() is not greater than
io_shared_outbox.queue_size—before invoking drainIOThreadsQueue() in both
VM_CreateCommand() and moduleUnregisterCommands(). If the condition is violated,
either defer the operation or implement an appropriate error handling strategy
to prevent potential worker stalls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Will be good to add this check to leave the server in a correct state after the module gets unloaded. Let me see if I can reproduce this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hpatro, glad the finding is worth pursuing! Since this PR only adds the design document, a dedicated issue would be a good way to track the fix. Would you like me to open one on the valkey-io/valkey repository? It would cover:

  • Adding the getPendingIOResponsesCount() > io_shared_outbox.queue_size guard (or an equivalent pause-I/O mechanism) before the drainIOThreadsQueue() calls in VM_CreateCommand() (module.c ~line 1420) and moduleUnregisterCommands() (module.c ~line 13251).
  • Validating that MODULE LOAD and UNLOAD are safe to invoke at runtime under active I/O load.

Let me know and I'll create it right away.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

gh issue create \
  --repo "valkey-io/valkey" \
  --title "module: add outbox-guard before drainIOThreadsQueue() in VM_CreateCommand() and moduleUnregisterCommands()" \
  --body "## Summary

\`drainIOThreadsQueue()\` has a documented precondition (see [design-docs/io-threads.md](https://github.com/valkey-io/valkey/pull/4022)): callers must ensure the \`io_shared_outbox\` cannot fill while the spin is in progress, otherwise workers can stall in \`flushPendingIOResponses()\` and the drain never completes. \`updateIOThreads()\` already enforces this with:

\`\`\`c
if (getPendingIOResponsesCount() > io_shared_outbox.queue_size) { /* refuse */ }
\`\`\`

However, two module lifecycle paths call \`drainIOThreadsQueue()\` at runtime **without** this guard:

| Call site | File | Approx. line | Triggered by |
|---|---|---|---|
| \`VM_CreateCommand()\` | \`src/module.c\` | ~1420 | MODULE LOAD at runtime |
| \`moduleUnregisterCommands()\` | \`src/module.c\` | ~13251 | MODULE UNLOAD at runtime |

Because MODULE LOAD and UNLOAD are live commands that can execute while I/O is actively in flight, either call site can deadlock if the outbox fills during the drain spin.

## Required changes

In \`src/module.c\`, before each of the two \`drainIOThreadsQueue()\` calls:

1. Check \`getPendingIOResponsesCount() > io_shared_outbox.queue_size\` (or an equivalent \`pauseClients\`/pause-I/O mechanism).
2. If the condition is violated, either defer the operation and return an error to the caller, or fully drain the outbox first so the guard is satisfied before spinning.

The fix should mirror the approach already used in \`updateIOThreads()\` in \`src/io_threads.c\`.

## Acceptance criteria

- MODULE LOAD and UNLOAD no longer risk a deadlock/stall when \`io_shared_outbox\` is near capacity.
- Existing test coverage for module load/unload still passes.
- Ideally, a stress test or unit test exercises the guard path.

## References

- PR that introduced the design doc: https://github.com/valkey-io/valkey/pull/4022
- Review comment: https://github.com/valkey-io/valkey/pull/4022#discussion_r3460398103

/cc `@hpatro`"

Length of output: 2338



@hpatro, the issue has been created: #4026

It captures the two affected call sites in src/module.c, the required guard logic, and links back to this PR and review comment. Good luck tracking it down!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that we can create issues via coderabbit

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I learned something already from this.

We can always add more, but I won't ask for it in the first version. This is a good overview to start with.

In future follow-ups, I'd be interested in how the threads use busy spinning vs sleeping and something about each task delegated to I/O threads. Probably many more details that I don't know about.

Comment thread design-docs/io-threads.md Outdated
hpatro added 2 commits June 24, 2026 01:22
Document the shared I/O threading infrastructure: SPMC inbox, MPSC outbox,
per-worker SPSC inboxes, job lifecycle, drain semantics, live
reconfiguration, and the contract a feature must follow to add a new
consumer. Leave placeholder sections for client and cluster-bus consumers
to fill in.

Signed-off-by: Harkrishn Patro <h_patro@apple.com>
Signed-off-by: Harkrishn Patro <h_patro@apple.com>
@hpatro hpatro force-pushed the io-threads-design-doc branch from a765d89 to 54953dd Compare June 24, 2026 00:23
@hpatro

hpatro commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Will update more via #3438

@hpatro hpatro merged commit ead3dc6 into valkey-io:unstable Jun 24, 2026
4 checks passed
Comment thread design-docs/io-threads.md
the per-client `io_read_state` / `io_write_state` rather than the global
counter.

## Live Reconfiguration

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a general question, what is the split between high and low level details here. This feels like a comment that should sit over updateIOThreads, not something in a design doc. There is also specific mention to specific low level variables (like IO_IGNITION_MAIN_THREAD_ACTIVE_PERCENT).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intertwined at this point which helps me visit the right section of code. I'm a newbie for this part so I like the mix of it. Happy to hear others thoughts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This document has been generated by Claude and I have done minor refactoring and removed sections I found unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants