Tighten NOT_KEY handling and add test coverage for NOT_KEY commands#3675
Conversation
Currently, for the initial cursor—specifically `CLUSTERSCAN 0`,
we calculate the slot for "0" (yielding 13907) and then redirect
the request to the corresponding node. However, the initial cursor
"0" should, in principle, be executable by any node, as its sole
purpose is to return the next `CLUSTERSCAN` cursor:
```
/* Handle cursor "0" case. If slot information is provided we return
* the updated cursor to scan input slot, else scan from slot 0. */
if (strcmp(objectGetVal(c->argv[1]), "0") == 0) {
if (opts.input_slot != -1) {
slot = opts.input_slot;
} else if (opts.match_slot != -1) {
slot = opts.match_slot; /* If match maps to a particular slot, start scan from there */
} else {
slot = 0;
}
addReplyArrayLen(c, 2);
if (skip_scan) {
addReplyBulkCString(c, "0");
} else {
sds new_cursor = sdscatfmt(sdsempty(), "0-{%s}-0", crc16_slot_table[slot]);
addReplyBulkSds(c, new_cursor);
}
addReplyArrayLen(c, 0);
return;
}
```
Add clusterscanGetKeys: cursor "0" returns no keys (handle locally),
non-"0" cursors return the cursor itself so the embedded {hashtag}
routes to the correct slot owner.
In a three shards empty cluster, before:
```
❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
-> Redirected to slot [13907] located at 127.0.0.1:30003
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30003> clusterscan 0-{06S}-0
-> Redirected to slot [0] located at 127.0.0.1:30001
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)
```
In a three shards empty cluster, after:
```
❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{06S}-0
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)
```
CLUSTERSCAN was introduced in valkey-io#2934.
Signed-off-by: Binbin <binloveplay1314@qq.com>
madolson
left a comment
There was a problem hiding this comment.
I prefer the way it is today for clients specifically. I think it's easier on the client side to mark the second key as opposed to a purpose built get keys command. The only benefit we get is that you can send the command to a random node.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR treats ChangesRouting-only token handling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/cluster/clusterscan.tcl (1)
67-94: 💤 Low valueConsider cleaning up the ACL user after the test.
The test creates user
scan_acl_leakbut doesn't delete it afterward. While this test block likely gets reset between runs, adding cleanup improves test isolation.♻️ Suggested cleanup
$rd read $rd close + + # Clean up the test user + R 0 ACL DELUSER scan_acl_leak }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/cluster/clusterscan.tcl` around lines 67 - 94, Add cleanup to remove the test ACL user "scan_acl_leak" after the test to avoid leaking state: after the client $rd is closed (or immediately after the last $rd read), call the corresponding ACL delete command (the inverse of R 0 ACL SETUSER used to create the user) — e.g., issue R 0 ACL DELUSER scan_acl_leak — so the user created by the test is removed when the test finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/cluster/clusterscan.tcl`:
- Around line 67-94: Add cleanup to remove the test ACL user "scan_acl_leak"
after the test to avoid leaking state: after the client $rd is closed (or
immediately after the last $rd read), call the corresponding ACL delete command
(the inverse of R 0 ACL SETUSER used to create the user) — e.g., issue R 0 ACL
DELUSER scan_acl_leak — so the user created by the test is removed when the test
finishes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 4ea48c2c-d593-456f-b0b0-32ac6af5740d
📒 Files selected for processing (6)
src/acl.csrc/commands.defsrc/commands/clusterscan.jsonsrc/db.csrc/server.htests/unit/cluster/clusterscan.tcl
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3675 +/- ##
============================================
+ Coverage 76.68% 76.82% +0.13%
============================================
Files 162 162
Lines 81021 81023 +2
============================================
+ Hits 62129 62243 +114
+ Misses 18892 18780 -112
🚀 New features to boost your workflow:
|
Signed-off-by: Binbin <binloveplay1314@qq.com>
nmvk
left a comment
There was a problem hiding this comment.
LGTM, Thanks @enjoy-binbin.
|
I think it's fine that CLUSTERSCAN 0 belongs to slot 13907. Cluster clients should send the command to the node that owns slot 13907. The client can also use a start cursor like I understand the inconvenience but an extra roundtrip is not expensive compared to a full cluster scan. |
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
The "CLUSTERSCAN 0 can be executed on any node directly" optimization now has been dropped. I have keep the other valuable parts, please refer to the top comment for the review. |
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
While reviewing #3675 I found a pre-existing `NOT_KEY` inconsistency in client tracking. `CLUSTERSCAN` is skipped by `COMMAND GETKEYS/ACL` through the key spec path, but tracking still uses `getKeysFromCommand()`. Updated to use `getKeysFromCommandWithSpecs` ``` raghav$ printf 'client tracking on\ninfo stats\nclusterscan 0-{06S}-0\ninfo stats\n' | /tmp/valkey-unstable-single/src/valkey-cli -p 7001 --raw | grep -E '^(OK|tracking_total_keys:|0$)' OK tracking_total_keys:2 0 tracking_total_keys:2 ```` This affects today `CLUSTERSCAN` because it is RO, `SPUBLISH/SSUBSCRIBE/SUNSUBSCRIBE` using `NOT_KEY` are not RO. Signed-off-by: nmvk <r@nmvk.com>
No bug exists — all NOT_KEY commands (CLUSTERSCAN, SSUBSCRIBE,
SPUBLISH, SUNSUBSCRIBE) have NULL getkeys_proc, so doesCommandHaveKeys
returns 0 and ACL correctly skips key checks. This PR makes the
handling more explicit and defensive for future commands that may
combine getkeys_proc with NOT_KEY key-specs.
Changes:
Refactor doesCommandHaveKeys() from a one-line ternary into clearer
if-else branches. Add an explicit check: if a command has getkeys_proc
but all its key-specs are NOT_KEY, treat it as having no real keys.
Add a defensive NOT_KEY check in ACLSelectorCheckKey() to skip
key-pattern ACL validation for NOT_KEY entries, consistent with the
existing skip in getKeysUsingKeySpecs().
Add tests: verify COMMAND GETKEYS/GETKEYSANDFLAGS report "no key
arguments" for NOT_KEY commands; verify restricted ACL users can
run CLUSTERSCAN without key-permission errors.
Related: #2934, #3699