Improve maintainability for kvstoreScan by JimB123 · Pull Request #3588 · valkey-io/valkey

JimB123 · 2026-04-29T14:53:43Z

The cursor in kvstore is composed of a hashtable cursor + an index of the hashtable within the kvstore. 48 bits are reserved for the hashtable cursor, and UP TO 16 bits are used for the hashtable index. The hashtable cursor is shifted by the number of bits actually needed for the hashtable index in a given kvstore (0-16).

This update shifts by a constant 16 bits. This eliminates variability in format. It also simplifies the cursor access functions as they no longer need to reference the specific kvstore in order to encode/decode the cursor. Reduced parameters and eliminated in/out parameters to cursor encode/decode functions.

After discussion about coupling and potential impact with the SCAN command(s), reduced the scope of this change to just addressing readability around the cursor conversions.

Updated kvstoreScan to more clearly show when a kvstore cursor was used vs. a hashtable cursor, rather than modifying it in-place in the same variable.

rainsupreme

Looks good to me!

While this is a worthy simplification, my understanding is that this is also part of the forkless work, as your plan is to use kvstore scan for the order that items are traversed, and it will need to understand whether incoming changes are before or after the cursor position.

codecov · 2026-04-29T18:26:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.67%. Comparing base (1d7224f) to head (97793a4).
⚠️ Report is 9 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #3588      +/-   ##
============================================
+ Coverage     76.66%   76.67%   +0.01%     
============================================
  Files           162      162              
  Lines         80644    80643       -1     
============================================
+ Hits          61823    61835      +12     
+ Misses        18821    18808      -13

Files with missing lines	Coverage Δ
src/kvstore.c	`96.61% <100.00%> (-0.04%)`	⬇️

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast

This changes the cursor values so it affects cursor compatibilty between versions.

We need to bump the fingerprint in the CLUSTERSLOTS cursor introduced in #2934.

It breaks SCANs that are performed over a period involving a failover in a mixed-versions cluster such as during a rolling upgrade, even if the nodes are configured with a fixed hash seed as introduced in #2608.

Given these problems, I'm not sure it's a good idea to accept this change.

murphyjacob4

It's more readable to me now

zuiderkwast · 2026-04-29T19:16:01Z

We need cross-version tests for the cross-node SCAN features mentioned above.

murphyjacob4 · 2026-04-29T19:38:10Z

It breaks SCANs that are performed over a period involving a failover in a mixed-versions cluster such as during a rolling upgrade, even if the nodes are configured with a fixed hash seed as introduced in #2608.

Good point on CLUSTERSCAN

I specifically remember in the CLUSTERSCAN design we wanted to encode the "memory layout version" to prevent the need for the cursor to maintain cross version stability. I think we should be okay making the cursor reset across versions, it wouldn't be breaking, just less efficient. Otherwise, it is very restrictive. We should be okay re-fingerprinting on each minor version, if needed.

murphyjacob4 · 2026-04-29T19:45:09Z

We need cross-version tests for the cross-node SCAN features mentioned above.

Yeah, we just need a test to catch when the cursor or memory layout changes. The test can be updated whenever we make those changes to make it pass

madolson · 2026-04-30T00:36:16Z

I specifically remember in the CLUSTERSCAN design we wanted to encode the "memory layout version" to prevent the need for the cursor to maintain cross version stability. I think we should be okay making the cursor reset across versions, it wouldn't be breaking, just less efficient. Otherwise, it is very restrictive. We should be okay re-fingerprinting on each minor version, if needed.

We wanted to encode so that we could change it as needed, I don't think readability is reason to change it. I agree with Viktor that I would rather not take this. Also, agree on the cross version scan test. We already have cross version testing, we can add SCAN to that.

roshkhatri

just suggestion not different from whats already said:

Add to CLUSTERSCAN fingerprint

as introduced in #2934, the fingerprint in clusterscanFingerprint() only hashes the see, we can incorporate a cursor layout version so the mismatch is detected and the scan can restarts cleanly:

in src/cluster.c

#define CURSOR_LAYOUT_VERSION 1
static const char *clusterscanFingerprint(void) {
.....
uint64_t hash = wangHash64(seed[0] ^ seed[1] ^ CURSOR_LAYOUT_VERSION);
.....
}

like @zuiderkwast @murphyjacob4 mentioned.

WDYT?

murphyjacob4 · 2026-04-30T06:20:02Z

We wanted to encode so that we could change it as needed, I don't think readability is reason to change it.

Yeah this was one thing I was worried about with the CLUSTERSCAN feature - I don't like us adding additional contracts (whether implicit, best effort, or otherwise) to the stability of the cursor over versions. I would almost prefer that we always break it over minor versions just so nobody takes a dependency on this and then gets broken when a bump does happen.

Also CLUSTERSCAN hasn't GA'd yet - so we could always merge this into 9.1 before GA and it should be fine?

zuiderkwast · 2026-04-30T10:52:42Z

Yeah this was one thing I was worried about with the CLUSTERSCAN feature - I don't like us adding additional contracts (whether implicit, best effort, or otherwise) to the stability of the cursor over versions. I would almost prefer that we always break it over minor versions just so nobody takes a dependency on this and then gets broken when a bump does happen.

I agree the contracts should be explicit.

Also CLUSTERSCAN hasn't GA'd yet - so we could always merge this into 9.1 before GA and it should be fine?

Good point that CLUSTERSCAN isn't GA yet. Nether is hash-seed (#2608). We do have a strategy for CLUSTERSCAN with the fingerprint, but what can we do for hash-seed in the future?

We discussed at some point in the past that we have two spare bits in the cursor (we only need 14 bits for the cluster slot, not 16) and that we can use these as a version of some kind, but that'd be internal and clients shouldn't be aware of the cursor representation. Perhaps we should just explicitly mention that it's not stable accross version in its documentation in valkey.conf, which currently doesn't cover versions:

# Use a fixed hash seed for hashtable instead of a random one.
# Setting this option makes commands like SCAN return keys in a consistent
# order across restarts and failovers. The seed can be any string up to 256 characters.
# The value is immutable and must be provided only at server startup.
#
# hash-seed example-seed-val

Another thing I remember we discussed when we introduced kvstore is that we can avoid returning very large numbers for the cursor. IIRC, that's why we didn't shift it in standalone mode and also why we put the slot in the lower bits instead of in the highest 16 bits. A small hash table will have a relatively small cursor.

murphyjacob4 · 2026-05-01T01:49:02Z

# Setting this option makes commands like SCAN return keys in a consistent
# order across restarts and failovers. The seed can be any string up to 256 characters.

The way this is documented - I expect people would think it is stable across versions. In fact, if there is no "fingerprint" in SCAN, they may silently skip some keys in the iteration during failover if the ordering changes across versions. This may result in correctness issues, if the SCAN user is relying on it returning all keys at least once.

We should explicitly document it is only stable if all instances are running the same version. We should also point out CLUSTERSCAN as a better alternative (where the cursor includes the fingerprint and would restart on memory layout changes) - I still don't think SCAN is safe for this kind of thing. But CLUSTERSCAN is not yet available on standalone - should we reconsider that?

Another thing I remember we discussed when we introduced kvstore is that we can avoid returning very large numbers for the cursor. IIRC, that's why we didn't shift it in standalone mode and also why we put the slot in the lower bits instead of in the highest 16 bits. A small hash table will have a relatively small cursor.

Yeah... cursor size seems like a non-issue. In theory we save a few bytes on the ascii encoding but if you are anyways doing SCAN, it would be a very small amount of overhead.

cursor representation

I don't care too much about which way we go. I do think 48|16 is simpler. But the variable 64|0 and 50|14 representation is manageable. Either way, it is good to hash out the consequences of CLUSTERSCAN and hash-seed before 9.1 GA

madolson · 2026-05-02T00:02:22Z

We should explicitly document it is only stable if all instances are running the same version. We should also point out CLUSTERSCAN as a better alternative (where the cursor includes the fingerprint and would restart on memory layout changes) - I still don't think SCAN is safe for this kind of thing. But CLUSTERSCAN is not yet available on standalone - should we reconsider that?

I feel like we (and by we, it's probably me) did a poor job explaining why a stable SCAN is important. It's not for the SCAN commands, it's for HSCAN and the like during failover or failures in cluster mode. Since it will be transparently redirected. Improving the stability of SCAN is useful, but scan is almost always a special case anyways.

rainsupreme · 2026-05-04T19:31:43Z

What did we do when we switched from dict to hashtable in kvstore and the datatypes? I don't remember discussing what happens to scans when a failover happens. Maybe we already have incompatibility between those versions?

rainsupreme · 2026-05-04T19:39:47Z

We should explicitly document it is only stable if all instances are running the same version. We should also point out CLUSTERSCAN as a better alternative (where the cursor includes the fingerprint and would restart on memory layout changes) - I still don't think SCAN is safe for this kind of thing. But CLUSTERSCAN is not yet available on standalone - should we reconsider that?

I feel like we (and by we, it's probably me) did a poor job explaining why a stable SCAN is important. It's not for the SCAN commands, it's for HSCAN and the like during failover or failures in cluster mode. Since it will be transparently redirected. Improving the stability of SCAN is useful, but scan is almost always a special case anyways.

This change only affects kvstore scans though, i.e. just the scan command, so it wouldn't affect hscan and the other datatype scans, right? they use a simple un-shifted hashtable cursor afaik

JimB123 · 2026-05-08T17:54:38Z

Certainly interesting discussion here. At this point, I'm of the mind that I can correct the readability issue - but leave the encoding as is.

One thing that we should start to think about is that we can consider decoupling internal cursors from the external values used by client commands. For example, if we're worried about the cursor value being too large, and want to remove extra bits, this can be done by scanCommand (which services the client commands) - and should not necessarily be a concern for the internal format of the kvstore scan cursor.

Signed-off-by: Jim Brunner <brunnerj@amazon.com>

zuiderkwast · 2026-05-08T21:19:41Z

    int skip = !ht || (skip_cb && skip_cb(ht)) || kvstoreIsImporting(kvs, didx);
    if (!skip) {
-        next_cursor = hashtableScan(ht, cursor, hashtableScanToKvstoreScanCallback, &cb_data);
+        ht_cursor = hashtableScan(ht, ht_cursor, hashtableScanToKvstoreScanCallback, &cb_data);


Distinguishing between current cursor and next cursor is clearer. Removing the distinction make the same variable mean different things above and below this line.

This is just an increment of the ht_cursor. The meaning remains the same... it's the current cursor both above and below. I think most readers would find this more intuitive. There's no need to maintain a variable for the old cursor value, and having 2 variables provides opportunity for mistake.

They are different values. They represent the cursor at different points in time.

In functional languages where variables are bound once and values are immutable (Haskell, ML, Erlang, Elixir, PureScript(?)), the single-assignement is considered to give great clarity. In math text, you never see x = x + 1. Instead you see x_n+1 = x_n + 1.

Now, C isn't a functional language like that, but it's possible to write in different styles. Here, the style with different variables for the old and the new cursor is used and there is no reason to change it. It's subjective, and when it is, we don't change it.

JimB123 · 2026-05-11T21:00:16Z

@murphyjacob4 says:

It's more readable to me now

@zuiderkwast says:

I think the whole PR is mostly an unnecessary change. The only substantial change was about the bit representation of the cursor, which you've now reverted.

I agree that with the bit representation unchanged, this is just about readability. It eliminates the code smell of in-place conversion of the cursor format, and using the same variable for ht_cursor and kvs_cursor interchangeably. From a code complexity perspective, it eliminates more complex in/out parameters to the conversion routines. It clarifies naming (matter of opinion) and slightly improves the IF logic around the check for KVSTORE_INDEX_NOT_FOUND.

But ultimately, it's just about readability/maintainability. At this point, we should either merge or close.

zuiderkwast · 2026-05-22T09:48:00Z

It eliminates the code smell of in-place conversion of the cursor format, and using the same variable for ht_cursor and kvs_cursor interchangeably. From a code complexity perspective, it eliminates more complex in/out parameters to the conversion routines.

This is true.

Feel free to merge it. It's been approved by multiple maintainers. I don't mind. (I just didn't think it was that important).

JimB123 requested a review from zuiderkwast April 29, 2026 14:53

github-actions Bot assigned JimB123 Apr 29, 2026

JimB123 requested a review from rainsupreme April 29, 2026 14:53

rainsupreme approved these changes Apr 29, 2026

View reviewed changes

JimB123 force-pushed the kvstore-cursor branch from c47a7d0 to 2322d19 Compare April 29, 2026 18:01

zuiderkwast reviewed Apr 29, 2026

View reviewed changes

murphyjacob4 approved these changes Apr 29, 2026

View reviewed changes

roshkhatri reviewed Apr 30, 2026

View reviewed changes

Comment thread src/kvstore.c

enjoy-binbin approved these changes Apr 30, 2026

View reviewed changes

improve readability for kvstoreScan

97793a4

Signed-off-by: Jim Brunner <brunnerj@amazon.com>

JimB123 force-pushed the kvstore-cursor branch from 2a79479 to 97793a4 Compare May 8, 2026 20:26

JimB123 changed the title ~~standardize kvstore scan cursor~~ Improve maintainability for kvstoreScan May 8, 2026

JimB123 commented May 8, 2026

View reviewed changes

Comment thread src/kvstore.c

Comment thread src/kvstore.c

zuiderkwast reviewed May 8, 2026

View reviewed changes

Comment thread src/kvstore.c

avifenesh mentioned this pull request May 10, 2026

[dogfood] Improve maintainability for kvstoreScan (mirrors upstream #3588) avifenesh/valkey#8

Closed

JimB123 merged commit 436dcae into valkey-io:unstable Jun 17, 2026
61 checks passed

Uh oh!

Conversation

JimB123 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rainsupreme left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

murphyjacob4 left a comment

Choose a reason for hiding this comment

Uh oh!

zuiderkwast commented Apr 29, 2026

Uh oh!

murphyjacob4 commented Apr 29, 2026

Uh oh!

murphyjacob4 commented Apr 29, 2026

Uh oh!

madolson commented Apr 30, 2026

Uh oh!

roshkhatri left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

murphyjacob4 commented Apr 30, 2026

Uh oh!

zuiderkwast commented Apr 30, 2026

Uh oh!

murphyjacob4 commented May 1, 2026

Uh oh!

madolson commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rainsupreme commented May 4, 2026

Uh oh!

rainsupreme commented May 4, 2026

Uh oh!

JimB123 commented May 8, 2026

Uh oh!

Uh oh!

Uh oh!

zuiderkwast May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimB123 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

zuiderkwast May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JimB123 commented May 11, 2026

Uh oh!

zuiderkwast commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

JimB123 commented Apr 29, 2026 •

edited

Loading

codecov Bot commented Apr 29, 2026 •

edited

Loading

roshkhatri left a comment •

edited

Loading

madolson commented May 2, 2026 •

edited

Loading

zuiderkwast May 8, 2026 •

edited

Loading

zuiderkwast May 11, 2026 •

edited

Loading