Add design doc for I/O threads#4022
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughA new design document ChangesI/O Threads Design Document
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 markdownlint-cli2 (0.22.1)design-docs/io-threads.mdmarkdownlint-cli2 v0.22.1 (markdownlint v0.40.0) Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@akashkgit would be great if you could take a look. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
design-docs/io-threads.md (2)
49-53: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueClarify the purpose and design rationale of per-worker private inboxes.
The document states that per-worker SPSC inboxes are used for "jobs that must run on a specific worker (e.g. argv frees pinned by
cur_tid, poll jobs at high thread counts)." However, it doesn't explain why these jobs must run on a specific worker (thread affinity, per-worker state, cache locality, etc.) or how the main thread decides which worker to target. This information would help implementers understand when to use the shared vs. private inbox.Consider adding a brief sentence on the design rationale.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@design-docs/io-threads.md` around lines 49 - 53, The table entry for io_private_inbox[i] lacks sufficient explanation of its design rationale and usage criteria. Expand the documentation to clarify why certain jobs must run on a specific worker (such as thread affinity for argv frees pinned by cur_tid, per-worker state management, or cache locality), how the main thread decides which worker to target, and provide concrete guidance on when implementers should route jobs to the private inbox versus the shared inbox for better decision-making.
79-83: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueClarify eligibility validation and per-consumer dispatch requirements.
The integration checklist mentions that a new consumer must add "a
try*ToIOThreads()dispatch helper on the main thread" that "validate[s] eligibility" (line 89), but the document doesn't explain what eligibility checks should be performed or what happens if eligibility fails. This could leave implementers uncertain about error handling and preconditions.Consider expanding this section with a few examples of eligibility checks (e.g., client state, load thresholds, feature flags).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@design-docs/io-threads.md` around lines 79 - 83, The document mentions that the `try*ToIOThreads()` dispatch helper on the main thread must validate eligibility, but it does not explain what eligibility checks should be performed or what behavior occurs when eligibility validation fails. Expand the eligibility validation section (around line 89 where this is first mentioned) by adding clarification and concrete examples of eligibility checks that new consumers should consider, such as validating client state, checking load thresholds, or verifying feature flags. Also explain the error handling or fallback behavior when eligibility validation fails, so implementers understand the expected preconditions and failure modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@design-docs/io-threads.md`:
- Around line 137-141: The functions VM_CreateCommand() and
moduleUnregisterCommands() both call drainIOThreadsQueue() without the necessary
outbox guard that exists in updateIOThreads(). Add the same safety
check—verifying that getPendingIOResponsesCount() is not greater than
io_shared_outbox.queue_size—before invoking drainIOThreadsQueue() in both
VM_CreateCommand() and moduleUnregisterCommands(). If the condition is violated,
either defer the operation or implement an appropriate error handling strategy
to prevent potential worker stalls.
---
Nitpick comments:
In `@design-docs/io-threads.md`:
- Around line 49-53: The table entry for io_private_inbox[i] lacks sufficient
explanation of its design rationale and usage criteria. Expand the documentation
to clarify why certain jobs must run on a specific worker (such as thread
affinity for argv frees pinned by cur_tid, per-worker state management, or cache
locality), how the main thread decides which worker to target, and provide
concrete guidance on when implementers should route jobs to the private inbox
versus the shared inbox for better decision-making.
- Around line 79-83: The document mentions that the `try*ToIOThreads()` dispatch
helper on the main thread must validate eligibility, but it does not explain
what eligibility checks should be performed or what behavior occurs when
eligibility validation fails. Expand the eligibility validation section (around
line 89 where this is first mentioned) by adding clarification and concrete
examples of eligibility checks that new consumers should consider, such as
validating client state, checking load thresholds, or verifying feature flags.
Also explain the error handling or fallback behavior when eligibility validation
fails, so implementers understand the expected preconditions and failure modes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 7a441b96-efa6-475f-bd8e-5403828af0df
📒 Files selected for processing (1)
design-docs/io-threads.md
| > **Caveat:** `drainIOThreadsQueue()` does not itself drain `io_shared_outbox`. | ||
| > Callers must ensure the outbox cannot fill while draining, otherwise workers | ||
| > can stall on `flushPendingIOResponses` and the spin will not progress. The | ||
| > current guard is in `updateIOThreads()`, which refuses the resize if | ||
| > `getPendingIOResponsesCount() > io_shared_outbox.queue_size`. |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find all callers of drainIOThreadsQueue() and verify they have outbox guards
# Search for calls to drainIOThreadsQueue
rg -n 'drainIOThreadsQueue\(' src/ --type=c -B 10 -A 5
# For each call site, check if getPendingIOResponsesCount() is checked first
rg -n 'getPendingIOResponsesCount\|drainIOThreadsQueue' src/io_threads.c -B 5 -A 10Repository: valkey-io/valkey
Length of output: 5216
🏁 Script executed:
# Get detailed context around the two module.c calls to drainIOThreadsQueue()
echo "=== Context around line 1420 (module.c) ==="
sed -n '1400,1440p' src/module.c
echo -e "\n=== Context around line 13251 (module.c) ==="
sed -n '13230,13270p' src/module.c
# Also check for any guards or conditions in the functions calling drainIOThreadsQueue in module.c
echo -e "\n=== Check function signatures and preceding checks ==="
sed -n '1390,1425p' src/module.c | head -40
sed -n '13240,13275p' src/module.c | head -40Repository: valkey-io/valkey
Length of output: 7334
🏁 Script executed:
# Search for callers of VM_CreateCommand
echo "=== Callers of VM_CreateCommand ==="
rg -n 'VM_CreateCommand' src/ --type=c -B 3 -A 2
# Search for callers of moduleUnregisterCommands
echo -e "\n=== Callers of moduleUnregisterCommands ==="
rg -n 'moduleUnregisterCommands' src/ --type=c -B 3 -A 2
# Check module loading/unloading flow
echo -e "\n=== Module lifecycle functions ==="
rg -n 'loadModule|unloadModule|module.*load' src/ --type=c | head -20Repository: valkey-io/valkey
Length of output: 5557
🏁 Script executed:
# Find moduleLoad and moduleUnload callers to understand when they're called
echo "=== Callers of moduleLoad ==="
rg -n 'moduleLoad\(' src/ --type=c -B 3 -A 2 | head -40
echo -e "\n=== Callers of moduleUnload or moduleUnloadAllModules ==="
rg -n 'moduleUnload|moduleUnloadAllModules' src/ --type=c -B 2 -A 2 | head -50
echo -e "\n=== Check if modules can be loaded via commands ==="
rg -n 'MODULE.*LOAD|loadModule|unloadModule' src/server.c | head -20Repository: valkey-io/valkey
Length of output: 5131
🏁 Script executed:
# Look for MODULE LOAD/UNLOAD command implementation
echo "=== MODULE LOAD/UNLOAD command handlers ==="
sed -n '14550,14600p' src/module.c
# Check if there's any CLIENT PAUSE or similar synchronization
echo -e "\n=== Check for pausing or locking around module operations ==="
rg -n 'pauseClients|pauseall|pause' src/module.c | head -20
# Look at moduleUnloadInternal to see what checks it does
echo -e "\n=== moduleUnloadInternal implementation ==="
sed -n '13535,13625p' src/module.cRepository: valkey-io/valkey
Length of output: 6254
Module LOAD/UNLOAD commands lack the outbox guard before drainIOThreadsQueue(), risking worker stalls at runtime.
The caveat in io-threads.md correctly warns that callers must guard against outbox overflow. However, VM_CreateCommand() (called via MODULE LOAD at line 1420) and moduleUnregisterCommands() (called via MODULE UNLOAD at line 13251) invoke drainIOThreadsQueue() without checking getPendingIOResponsesCount() first—unlike updateIOThreads() which has this guard. Since MODULE LOAD and UNLOAD are runtime commands that execute while I/O may be pending, these code paths can deadlock if the outbox fills during drain.
Either add the getPendingIOResponsesCount() > io_shared_outbox.queue_size check to both module functions, or pause I/O before module load/unload operations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@design-docs/io-threads.md` around lines 137 - 141, The functions
VM_CreateCommand() and moduleUnregisterCommands() both call
drainIOThreadsQueue() without the necessary outbox guard that exists in
updateIOThreads(). Add the same safety check—verifying that
getPendingIOResponsesCount() is not greater than
io_shared_outbox.queue_size—before invoking drainIOThreadsQueue() in both
VM_CreateCommand() and moduleUnregisterCommands(). If the condition is violated,
either defer the operation or implement an appropriate error handling strategy
to prevent potential worker stalls.
There was a problem hiding this comment.
Interesting. Will be good to add this check to leave the server in a correct state after the module gets unloaded. Let me see if I can reproduce this.
There was a problem hiding this comment.
@hpatro, glad the finding is worth pursuing! Since this PR only adds the design document, a dedicated issue would be a good way to track the fix. Would you like me to open one on the valkey-io/valkey repository? It would cover:
- Adding the
getPendingIOResponsesCount() > io_shared_outbox.queue_sizeguard (or an equivalent pause-I/O mechanism) before thedrainIOThreadsQueue()calls inVM_CreateCommand()(module.c ~line 1420) andmoduleUnregisterCommands()(module.c ~line 13251). - Validating that MODULE LOAD and UNLOAD are safe to invoke at runtime under active I/O load.
Let me know and I'll create it right away.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
gh issue create \
--repo "valkey-io/valkey" \
--title "module: add outbox-guard before drainIOThreadsQueue() in VM_CreateCommand() and moduleUnregisterCommands()" \
--body "## Summary
\`drainIOThreadsQueue()\` has a documented precondition (see [design-docs/io-threads.md](https://github.com/valkey-io/valkey/pull/4022)): callers must ensure the \`io_shared_outbox\` cannot fill while the spin is in progress, otherwise workers can stall in \`flushPendingIOResponses()\` and the drain never completes. \`updateIOThreads()\` already enforces this with:
\`\`\`c
if (getPendingIOResponsesCount() > io_shared_outbox.queue_size) { /* refuse */ }
\`\`\`
However, two module lifecycle paths call \`drainIOThreadsQueue()\` at runtime **without** this guard:
| Call site | File | Approx. line | Triggered by |
|---|---|---|---|
| \`VM_CreateCommand()\` | \`src/module.c\` | ~1420 | MODULE LOAD at runtime |
| \`moduleUnregisterCommands()\` | \`src/module.c\` | ~13251 | MODULE UNLOAD at runtime |
Because MODULE LOAD and UNLOAD are live commands that can execute while I/O is actively in flight, either call site can deadlock if the outbox fills during the drain spin.
## Required changes
In \`src/module.c\`, before each of the two \`drainIOThreadsQueue()\` calls:
1. Check \`getPendingIOResponsesCount() > io_shared_outbox.queue_size\` (or an equivalent \`pauseClients\`/pause-I/O mechanism).
2. If the condition is violated, either defer the operation and return an error to the caller, or fully drain the outbox first so the guard is satisfied before spinning.
The fix should mirror the approach already used in \`updateIOThreads()\` in \`src/io_threads.c\`.
## Acceptance criteria
- MODULE LOAD and UNLOAD no longer risk a deadlock/stall when \`io_shared_outbox\` is near capacity.
- Existing test coverage for module load/unload still passes.
- Ideally, a stress test or unit test exercises the guard path.
## References
- PR that introduced the design doc: https://github.com/valkey-io/valkey/pull/4022
- Review comment: https://github.com/valkey-io/valkey/pull/4022#discussion_r3460398103
/cc `@hpatro`"Length of output: 2338
@hpatro, the issue has been created: #4026
It captures the two affected call sites in src/module.c, the required guard logic, and links back to this PR and review comment. Good luck tracking it down!
There was a problem hiding this comment.
Interesting that we can create issues via coderabbit
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
zuiderkwast
left a comment
There was a problem hiding this comment.
This is great! I learned something already from this.
We can always add more, but I won't ask for it in the first version. This is a good overview to start with.
In future follow-ups, I'd be interested in how the threads use busy spinning vs sleeping and something about each task delegated to I/O threads. Probably many more details that I don't know about.
Document the shared I/O threading infrastructure: SPMC inbox, MPSC outbox, per-worker SPSC inboxes, job lifecycle, drain semantics, live reconfiguration, and the contract a feature must follow to add a new consumer. Leave placeholder sections for client and cluster-bus consumers to fill in. Signed-off-by: Harkrishn Patro <h_patro@apple.com>
Signed-off-by: Harkrishn Patro <h_patro@apple.com>
a765d89 to
54953dd
Compare
|
Will update more via #3438 |
| the per-client `io_read_state` / `io_write_state` rather than the global | ||
| counter. | ||
|
|
||
| ## Live Reconfiguration |
There was a problem hiding this comment.
More of a general question, what is the split between high and low level details here. This feels like a comment that should sit over updateIOThreads, not something in a design doc. There is also specific mention to specific low level variables (like IO_IGNITION_MAIN_THREAD_ACTIVE_PERCENT).
There was a problem hiding this comment.
It's intertwined at this point which helps me visit the right section of code. I'm a newbie for this part so I like the mix of it. Happy to hear others thoughts.
There was a problem hiding this comment.
Note: This document has been generated by Claude and I have done minor refactoring and removed sections I found unnecessary.
Document the shared I/O threading infrastructure: SPMC inbox, MPSC outbox, per-worker SPSC inboxes, job lifecycle, drain semantics, live reconfiguration.
Will follow up with cluster io threads offload with #3438