Skip to content

Tighten NOT_KEY handling and add test coverage for NOT_KEY commands#3675

Merged
enjoy-binbin merged 9 commits into
valkey-io:unstablefrom
enjoy-binbin:clusterscan0
Jun 25, 2026
Merged

Tighten NOT_KEY handling and add test coverage for NOT_KEY commands#3675
enjoy-binbin merged 9 commits into
valkey-io:unstablefrom
enjoy-binbin:clusterscan0

Conversation

@enjoy-binbin

@enjoy-binbin enjoy-binbin commented May 12, 2026

Copy link
Copy Markdown
Member

No bug exists — all NOT_KEY commands (CLUSTERSCAN, SSUBSCRIBE,
SPUBLISH, SUNSUBSCRIBE) have NULL getkeys_proc, so doesCommandHaveKeys
returns 0 and ACL correctly skips key checks. This PR makes the
handling more explicit and defensive for future commands that may
combine getkeys_proc with NOT_KEY key-specs.

Changes:

  • Refactor doesCommandHaveKeys() from a one-line ternary into clearer
    if-else branches. Add an explicit check: if a command has getkeys_proc
    but all its key-specs are NOT_KEY, treat it as having no real keys.

  • Add a defensive NOT_KEY check in ACLSelectorCheckKey() to skip
    key-pattern ACL validation for NOT_KEY entries, consistent with the
    existing skip in getKeysUsingKeySpecs().

  • Add tests: verify COMMAND GETKEYS/GETKEYSANDFLAGS report "no key
    arguments" for NOT_KEY commands; verify restricted ACL users can
    run CLUSTERSCAN without key-permission errors.

Related: #2934, #3699

Currently, for the initial cursor—specifically `CLUSTERSCAN 0`,
we calculate the slot for "0" (yielding 13907) and then redirect
the request to the corresponding node. However, the initial cursor
"0" should, in principle, be executable by any node, as its sole
purpose is to return the next `CLUSTERSCAN` cursor:
```
    /* Handle cursor "0" case. If slot information is provided we return
     * the updated cursor to scan input slot, else scan from slot 0. */
    if (strcmp(objectGetVal(c->argv[1]), "0") == 0) {
        if (opts.input_slot != -1) {
            slot = opts.input_slot;
        } else if (opts.match_slot != -1) {
            slot = opts.match_slot; /* If match maps to a particular slot, start scan from there */
        } else {
            slot = 0;
        }

        addReplyArrayLen(c, 2);
        if (skip_scan) {
            addReplyBulkCString(c, "0");
        } else {
            sds new_cursor = sdscatfmt(sdsempty(), "0-{%s}-0", crc16_slot_table[slot]);
            addReplyBulkSds(c, new_cursor);
        }
        addReplyArrayLen(c, 0);
        return;
    }
```

Add clusterscanGetKeys: cursor "0" returns no keys (handle locally),
non-"0" cursors return the cursor itself so the embedded {hashtag}
routes to the correct slot owner.

In a three shards empty cluster, before:
```
❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
-> Redirected to slot [13907] located at 127.0.0.1:30003
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30003> clusterscan 0-{06S}-0
-> Redirected to slot [0] located at 127.0.0.1:30001
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)
```

In a three shards empty cluster, after:
```
❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{06S}-0
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)
```

CLUSTERSCAN was introduced in valkey-io#2934.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Comment thread src/db.c Outdated

@madolson madolson left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the way it is today for clients specifically. I think it's easier on the client side to mark the second key as opposed to a purpose built get keys command. The only benefit we get is that you can send the command to a random node.

Comment thread src/db.c Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e7a2ef6e-e6ce-4a51-8c0b-2bc43a6598e2

📥 Commits

Reviewing files that changed from the base of the PR and between 82ef047 and 7259957.

📒 Files selected for processing (1)
  • tests/unit/cluster/clusterscan.tcl
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/cluster/clusterscan.tcl

📝 Walkthrough

Walkthrough

The PR treats CMD_KEY_NOT_KEY key-spec entries as routing-only in ACL selection and key detection, and adds regression tests covering CLUSTERSCAN cursor tokens and COMMAND GETKEYS* behavior.

Changes

Routing-only token handling

Layer / File(s) Summary
Key-extraction contract and classification
src/db.c
doesCommandHaveKeys() now returns true only when a key-spec is not CMD_KEY_NOT_KEY, and routing-only key-specs suppress getkeys_proc key reporting.
ACL routing bypass
src/acl.c
ACLSelectorCheckKey() now allows CMD_KEY_NOT_KEY key-spec entries without running key-pattern permission checks.
Regression tests
tests/unit/introspection-2.tcl, tests/unit/cluster/clusterscan.tcl
Adds COMMAND GETKEYS* expectations for routing-only arguments and ACL-restricted CLUSTERSCAN cursor checks for plain and hashtag cursors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • hpatro
  • madolson
  • zuiderkwast
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: tighter NOT_KEY handling plus added tests.
Description check ✅ Passed The description is directly related to the changeset and matches the implementation and tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@enjoy-binbin

Copy link
Copy Markdown
Member Author

@nmvk @madolson I don't know the ACL details as much as you do, i agree this is a good-to-have option, let me know if we want to close it.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/cluster/clusterscan.tcl (1)

67-94: 💤 Low value

Consider cleaning up the ACL user after the test.

The test creates user scan_acl_leak but doesn't delete it afterward. While this test block likely gets reset between runs, adding cleanup improves test isolation.

♻️ Suggested cleanup
         $rd read

         $rd close
+
+        # Clean up the test user
+        R 0 ACL DELUSER scan_acl_leak
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/cluster/clusterscan.tcl` around lines 67 - 94, Add cleanup to
remove the test ACL user "scan_acl_leak" after the test to avoid leaking state:
after the client $rd is closed (or immediately after the last $rd read), call
the corresponding ACL delete command (the inverse of R 0 ACL SETUSER used to
create the user) — e.g., issue R 0 ACL DELUSER scan_acl_leak — so the user
created by the test is removed when the test finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/cluster/clusterscan.tcl`:
- Around line 67-94: Add cleanup to remove the test ACL user "scan_acl_leak"
after the test to avoid leaking state: after the client $rd is closed (or
immediately after the last $rd read), call the corresponding ACL delete command
(the inverse of R 0 ACL SETUSER used to create the user) — e.g., issue R 0 ACL
DELUSER scan_acl_leak — so the user created by the test is removed when the test
finishes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4ea48c2c-d593-456f-b0b0-32ac6af5740d

📥 Commits

Reviewing files that changed from the base of the PR and between a813df0 and 688ee70.

📒 Files selected for processing (6)
  • src/acl.c
  • src/commands.def
  • src/commands/clusterscan.json
  • src/db.c
  • src/server.h
  • tests/unit/cluster/clusterscan.tcl

@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.82%. Comparing base (79bca53) to head (82ef047).

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3675      +/-   ##
============================================
+ Coverage     76.68%   76.82%   +0.13%     
============================================
  Files           162      162              
  Lines         81021    81023       +2     
============================================
+ Hits          62129    62243     +114     
+ Misses        18892    18780     -112     
Files with missing lines Coverage Δ
src/acl.c 92.65% <100.00%> (+<0.01%) ⬆️
src/db.c 94.85% <100.00%> (+<0.01%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/acl.c
Signed-off-by: Binbin <binloveplay1314@qq.com>

@nmvk nmvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @enjoy-binbin.

@zuiderkwast

Copy link
Copy Markdown
Contributor

I think it's fine that CLUSTERSCAN 0 belongs to slot 13907. Cluster clients should send the command to the node that owns slot 13907.

The client can also use a start cursor like "0-{Margret}-0" (HASH_SLOT("Margret") = 0) if it wants to send the command to the node owning slot 0, if it wants to start scanning at slot 0 and avoid a round trip. Or to whichever slot where it wants to start scanning.

I understand the inconvenience but an extra roundtrip is not expensive compared to a full cluster scan.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin changed the title Allow CLUSTERSCAN 0 to be executed on any node directly Tighten NOT_KEY handling and add test coverage for NOT_KEY commands Jun 25, 2026
@enjoy-binbin

Copy link
Copy Markdown
Member Author

@madolson @zuiderkwast @nmvk

The "CLUSTERSCAN 0 can be executed on any node directly" optimization now has been dropped. I have keep the other valuable parts, please refer to the top comment for the review.

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/server.h
Comment thread src/server.h
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>

@nmvk nmvk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@enjoy-binbin enjoy-binbin merged commit 0473793 into valkey-io:unstable Jun 25, 2026
61 of 62 checks passed
@enjoy-binbin enjoy-binbin deleted the clusterscan0 branch June 25, 2026 05:54
enjoy-binbin pushed a commit that referenced this pull request Jun 25, 2026
While reviewing #3675 I found a pre-existing `NOT_KEY` inconsistency in
client tracking.

`CLUSTERSCAN` is skipped by `COMMAND GETKEYS/ACL` through the key spec
path, but tracking still uses `getKeysFromCommand()`. Updated to use
`getKeysFromCommandWithSpecs`

```
raghav$ printf 'client tracking on\ninfo stats\nclusterscan 0-{06S}-0\ninfo stats\n'   | /tmp/valkey-unstable-single/src/valkey-cli -p 7001 --raw   | grep -E '^(OK|tracking_total_keys:|0$)'
OK
tracking_total_keys:2
0
tracking_total_keys:2
````

This affects today `CLUSTERSCAN` because it is RO,
`SPUBLISH/SSUBSCRIBE/SUNSUBSCRIBE` using `NOT_KEY` are not RO.

Signed-off-by: nmvk <r@nmvk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants