Make the number of healthy nodes in gossip section configurable by hpatro · Pull Request #2746 · valkey-io/valkey

hpatro · 2025-10-17T18:29:36Z

This PR introduces a new config cluster-message-gossip-perc which allows an operator to modify the amount of gossip node information to be sent per ping/pong/meet message. It can be modified dynamically (Related to #2291). The default value is 10% i.e. 10% of peer node information would be gossiped along with each ping/pong/meet packet. Users can tune this configuration, setting the value higher allows faster information dissemination whereas setting it lower would lead to direct PING messages if no information was received about a node with the server.cluster_node_timeout/2 period.

Note: the behavior for partially failed gossip nodes still remains intact where all the pfail nodes are part of the message for faster propagation of information and faster transition of PFAIL to FAIL.

zuiderkwast · 2025-10-17T21:23:52Z

What are the theoretical implications of lowering the number?

Sending 10 pings with n/10 gossips achieves the same information-spreading effect as sending 20 pings n/20 gossips? So failure detection and convenrgence of any changes slows down linearly with this config?

I'm fine with a config like this, but I (and others, you included?) have a feeling we can gossip smarter without sacrificing anything.

I really liked the idea of prioritizing gossips about node for which there was a recent change, this idea: #1897 (comment)

Can we add a last-modified timetamp to each node and do a weighted random?

Another idea is to increment a score for each node we gossip about and then prioritize the ones with lower score next time.

hpatro · 2025-10-17T22:05:53Z

I really liked the idea of prioritizing gossips about node for which there was a recent change, this idea: #1897 (comment)

I came across this idea in hashicorp's serf. It requires a bit of work to index node by different type and have logic around which ones to prioritise more. Quite achievable to have smarter gossip.

This PR is a guardrail for the current system to avoid CPU/network spikes.

hpatro · 2025-10-18T00:01:33Z

What are the theoretical implications of lowering the number?

We need to guarantee a message to be received directly or indirectly from another node within node-timeout/2 period. If that is met we don't send out another message.

So, this might lead to more direct pings which has higher overhead. Gossip node information is 106 bytes. However, the entire payload is around 2200B.

zuiderkwast

I think this is pretty clean and we could merge in it for 9.1 to be able to tune and evaluate it over time, but are we sure we want to maintain this config going forward? We could say it's experimental and make it a hidden config to start with.

Can you mention in the top comment the theoretical implications of for example lowering the value to 5%? Is it like: If we didn't get gossip about a node during a node timeout, we ping it, so in the worst case we'll just send more pings? Doesn't this mean we can measure the cluster bus traffic and tune this value? When we've done that, we can change the default?

The failing test case is not affected by this change, right?

[err]: Multiple primary nodes are down, rank them based on the failed primary in tests/unit/cluster/failover2.tcl
No failover detected

hpatro · 2026-03-05T19:20:46Z

I think this is pretty clean and we could merge in it for 9.1 to be able to tune and evaluate it over time, but are we sure we want to maintain this config going forward? We could say it's experimental and make it a hidden config to start with.

Yeah, makes sense to move it to a hidden config.

Can you mention in the top comment the theoretical implications of for example lowering the value to 5%? Is it like: If we didn't get gossip about a node during a node timeout, we ping it, so in the worst case we'll just send more pings? Doesn't this mean we can measure the cluster bus traffic and tune this value? When we've done that, we can change the default?

Pretty spot on. Will add it above.

The failing test case is not affected by this change, right?
[err]: Multiple primary nodes are down, rank them based on the failed primary in tests/unit/cluster/failover2.tcl
No failover detected

Taking a look.

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

sarthakaggarwal97 · 2026-03-06T18:13:02Z

+     * information would be broadcasted. */
    int pfail_wanted = server.cluster->stats_pfail_nodes;
+    if (pfail_wanted >= wanted) {
+        pfail_wanted = wanted;


I am not too sure about this but if pfail_wanted = wanted, is it possible that it will keep sending the same nodes again and again if the quorum is not reached to mark them as fail.

Yeah the test failure seems to be pointing to that as well. We risk slower propagation of information. I'm trying to see how we can converge sooner.

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

zuiderkwast · 2026-03-06T20:09:58Z

The four failed for CI jobs all failed on this test case:

*** [err]: Multiple primary nodes are down, rank them based on the failed primary in tests/unit/cluster/failover2.tcl
No failover detected

Is it caused by this change?

hpatro · 2026-03-06T22:14:53Z

The four failed for CI jobs all failed on this test case:
*** [err]: Multiple primary nodes are down, rank them based on the failed primary in tests/unit/cluster/failover2.tcl
No failover detected
Is it caused by this change?

Yes, I'm suspicious of that. Let's decouple the pfail nodes accounting as part of this change. I will tackle it separately. The reasoning is it slows down all the pfail node discovery and would rather do it in a batched manner. I've reverted to the default mechanism 83dc041.

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

codecov · 2026-03-06T22:49:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.05%. Comparing base (8c397ea) to head (77eb3b1).
⚠️ Report is 6 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2746      +/-   ##
============================================
+ Coverage     74.98%   75.05%   +0.07%     
============================================
  Files           129      129              
  Lines         71551    71632      +81     
============================================
+ Hits          53654    53765     +111     
+ Misses        17897    17867      -30

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`88.10% <ø> (+0.03%)`	⬆️
src/config.c	`78.56% <ø> (ø)`
src/server.h	`100.00% <ø> (ø)`

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sarthakaggarwal97

LGTM!

zuiderkwast

OK, looks good.

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

…le (valkey-io#2746)" This reverts commit 0ff55de.

This PR introduces a new config `cluster-message-gossip-perc` which allows an operator to modify the amount of gossip node information to be sent per ping/pong/meet message. It can be modified dynamically (Related to #2291). The default value is 10% i.e. 10% of peer node information would be gossiped along with each ping/pong/meet packet. Users can tune this configuration, setting the value higher allows faster information dissemination whereas setting it lower would lead to direct PING messages if no information was received about a node with the `server.cluster_node_timeout/2` period. Note: the behavior for partially failed gossip nodes still remains intact where all the `pfail` nodes are part of the message for faster propagation of information and faster transition of `PFAIL` to `FAIL`. --------- Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

hpatro requested a review from madolson October 17, 2025 18:29

github-actions Bot assigned hpatro Oct 17, 2025

hpatro force-pushed the cluster_ping_gossip_count branch 2 times, most recently from 7b29e59 to c5642e8 Compare October 17, 2025 18:40

roshkhatri reviewed Oct 17, 2025

View reviewed changes

Comment thread src/cluster_legacy.c Outdated

roshkhatri reviewed Oct 17, 2025

View reviewed changes

Comment thread src/cluster_legacy.c Outdated

roshkhatri reviewed Oct 17, 2025

View reviewed changes

Comment thread src/cluster_legacy.c Outdated

sarthakaggarwal97 reviewed Oct 24, 2025

View reviewed changes

Comment thread src/config.c Outdated

zuiderkwast reviewed Feb 26, 2026

View reviewed changes

Comment thread src/cluster_legacy.c Outdated

sarthakaggarwal97 reviewed Feb 26, 2026

View reviewed changes

Comment thread src/cluster_legacy.c Outdated

Comment thread src/cluster_legacy.c

hpatro force-pushed the cluster_ping_gossip_count branch 3 times, most recently from 78c92ec to 1a62721 Compare March 5, 2026 22:01

Bound gossip node information in a clusterbus ping/pong messages

40bc9f3

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

hpatro force-pushed the cluster_ping_gossip_count branch from 1a62721 to 40bc9f3 Compare March 6, 2026 05:40

Refactor tcl test

449709d

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

zuiderkwast reviewed Mar 6, 2026

View reviewed changes

Comment thread valkey.conf Outdated

zuiderkwast reviewed Mar 6, 2026

View reviewed changes

Comment thread src/config.c Outdated

Address feedback

9476672

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

sarthakaggarwal97 reviewed Mar 6, 2026

View reviewed changes

hpatro added 2 commits March 6, 2026 10:23

Rename cluster-message-gossip-perc config name

cff0506

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Merge branch 'unstable' into cluster_ping_gossip_count

cdc2ec2

Exclude pfail nodes from the gossip count accounting

83dc041

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

hpatro changed the title ~~Bound the number of nodes in gossip section~~ Bound the number of healthy nodes in gossip section Mar 6, 2026

hpatro changed the title ~~Bound the number of healthy nodes in gossip section~~ Make the number of healthy nodes in gossip section configurable Mar 6, 2026

sarthakaggarwal97 approved these changes Mar 7, 2026

View reviewed changes

zuiderkwast approved these changes Mar 9, 2026

View reviewed changes

Comment thread tests/unit/cluster/packet.tcl Outdated

Address feedback

77eb3b1

Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

hpatro merged commit 0ff55de into valkey-io:unstable Mar 9, 2026
71 of 72 checks passed

JimB123 added a commit to JimB123/valkey that referenced this pull request Mar 10, 2026

Revert "Make the number of healthy nodes in gossip section configurab…

5e4c4a7

…le (valkey-io#2746)" This reverts commit 0ff55de.

hpatro mentioned this pull request Apr 8, 2026

[TEST-FAILURE] Gossip count scales with higher percentage of cluster-message-gossip-perc in tests/unit/cluster/packet.tcl #3454

Closed

Uh oh!

Conversation

hpatro commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Oct 17, 2025

Uh oh!

hpatro commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hpatro commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hpatro commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarthakaggarwal97 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

hpatro Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

zuiderkwast commented Mar 6, 2026

Uh oh!

hpatro commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sarthakaggarwal97 left a comment

Choose a reason for hiding this comment

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hpatro commented Oct 17, 2025 •

edited

Loading

hpatro commented Oct 17, 2025 •

edited

Loading

hpatro commented Oct 18, 2025 •

edited

Loading

zuiderkwast left a comment •

edited

Loading

hpatro commented Mar 5, 2026 •

edited

Loading

hpatro commented Mar 6, 2026 •

edited

Loading

codecov Bot commented Mar 6, 2026 •

edited

Loading