Skip to content

fix(cluster): Remove per-call srand in clusterManagerNodePrimaryRandom#3586

Merged
ranshid merged 3 commits into
valkey-io:unstablefrom
abmathur-ie:fix/srand-uncovered-slots
Apr 30, 2026
Merged

fix(cluster): Remove per-call srand in clusterManagerNodePrimaryRandom#3586
ranshid merged 3 commits into
valkey-io:unstablefrom
abmathur-ie:fix/srand-uncovered-slots

Conversation

@abmathur-ie

Copy link
Copy Markdown
Contributor

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every invocation, then immediately rand() % primary_count. When called in a tight loop for uncovered slots, all calls within the same wall-clock second produce the identical seed, causing every uncovered slot to be assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup (srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to advance its state across calls, distributing uncovered slots randomly across available primaries.

@murphyjacob4

murphyjacob4 commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

srand(time(NULL) ^ getpid()) at line 9838

This is only called in LRUTestMode, right?

https://github.com/valkey-io/valkey/blob/unstable/src/valkey-cli.c#L9832-L9838

Overall, it would make sense to just srand once

@codecov

codecov Bot commented Apr 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.58%. Comparing base (5b7ac66) to head (908f728).
⚠️ Report is 1 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3586      +/-   ##
============================================
- Coverage     76.66%   76.58%   -0.08%     
============================================
  Files           160      160              
  Lines         80472    80469       -3     
============================================
- Hits          61690    61625      -65     
- Misses        18782    18844      +62     
Files with missing lines Coverage Δ
src/valkey-cli.c 60.18% <100.00%> (+0.01%) ⬆️

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@abmathur-ie

abmathur-ie commented Apr 29, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @murphyjacob4 . Addressed the issue and made a single srand call now. this should not have the DCO issue either.

Abhishek Mathur added 2 commits April 28, 2026 22:24
clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
  Move the PRNG seeding to a single srand(time(NULL) ^ getpid()) call
  in main(), removing redundant per-function srand() calls from
  clusterManagerOptimizeAntiAffinity(), pipeMode(), and LRUTestMode().

  This ensures consistent entropy (pid mixing) across all code paths
  and prevents any future tight-loop reseeding bugs like the one fixed
  in clusterManagerNodePrimaryRandom().

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
@abmathur-ie abmathur-ie force-pushed the fix/srand-uncovered-slots branch from 12f27a9 to 749f882 Compare April 29, 2026 05:24

@rainsupreme rainsupreme left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Good find on the bug :)

@roshkhatri roshkhatri left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

Nit. can you please update the top comment and PRNG is already seeded at startup (srand(time(NULL) ^ getpid()) at line 9838), we can change it to this approach consolidated all srand calls into main()

@ranshid ranshid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thank you @abmathur-ie !

@ranshid ranshid moved this to To be backported in Valkey 7.2 Apr 30, 2026
@ranshid ranshid moved this to To be backported in Valkey 8.0 Apr 30, 2026
@ranshid ranshid moved this to To be backported in Valkey 8.1 Apr 30, 2026
@ranshid ranshid moved this to To be backported in Valkey 9.0 Apr 30, 2026
@ranshid ranshid moved this to To be backported in Valkey 9.1 Apr 30, 2026
@ranshid ranshid merged commit 7e2a2f7 into valkey-io:unstable Apr 30, 2026
59 checks passed
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request May 7, 2026
valkey-io#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request May 7, 2026
valkey-io#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request May 7, 2026
valkey-io#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request May 7, 2026
valkey-io#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
ranshid added a commit that referenced this pull request May 12, 2026
Backport of #3586 to the 9.1 branch.

## Problem

`clusterManagerNodePrimaryRandom()` called `srand(time(NULL))` on every
invocation, then immediately `rand() % primary_count`. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

The same issue existed in `clusterManagerOptimizeAntiAffinity`,
`pipeMode`, and `LRUTestMode`.

## Solution

Remove all per-call `srand()` invocations and consolidate into a single
`srand(time(NULL) ^ getpid())` at the start of `main()`. This allows
`rand()` to advance its state across calls, distributing uncovered slots
randomly across available primaries.

(cherry picked from commit 7e2a2f7)

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@sarthakaggarwal97 sarthakaggarwal97 moved this from To be backported to Done in Valkey 9.1 May 16, 2026
valkeyrie-ops Bot pushed a commit that referenced this pull request May 18, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 18, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 18, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 20, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 29, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
zuiderkwast pushed a commit that referenced this pull request Jun 2, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
@zuiderkwast zuiderkwast moved this from To be backported to 8.1.8 (WIP) in Valkey 8.1 Jun 2, 2026
zuiderkwast pushed a commit that referenced this pull request Jun 2, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 3, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 13, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
@valkeyrie-ops valkeyrie-ops Bot moved this from To be backported to Done in Valkey 9.0 Jun 16, 2026
@valkeyrie-ops valkeyrie-ops Bot moved this from To be backported to Done in Valkey 7.2 Jun 16, 2026
@valkeyrie-ops valkeyrie-ops Bot moved this from To be backported to Done in Valkey 8.0 Jun 16, 2026
sarthakaggarwal97 pushed a commit that referenced this pull request Jun 17, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
sarthakaggarwal97 pushed a commit that referenced this pull request Jun 18, 2026
#3586)

clusterManagerNodePrimaryRandom() called srand(time(NULL)) on every
invocation, then immediately rand() % primary_count. When called in a
tight loop for uncovered slots, all calls within the same wall-clock
second produce the identical seed, causing every uncovered slot to be
assigned to the same primary node.

Remove the srand() call since the PRNG is already seeded at startup
(srand(time(NULL) ^ getpid()) at line 9838). This allows rand() to
advance its state across calls, distributing uncovered slots randomly
across available primaries.

---------

Signed-off-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Abhishek Mathur <matshek@amazon.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done
Status: Done
Status: 8.1.8
Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants