Fix two primaries scenario due to unknown shard_id by deepakrn · Pull Request #2586 · valkey-io/valkey

deepakrn · 2025-09-05T05:05:54Z

This PR tries to address issue #2261 where a certain order of events leads to two primaries in the same shard.

Fundamentally, the problem with the current state of the system is - if a node hasn't directly been PINGed by another node and has learnt about its presence via gossip, it gets associated with a random shard_id. This means that whenever, the other node with random shard_id advertises its slot, this node initially thinks that it lost all its slots to the other node from another shard and then processes the shard_id change. This leaves the node in a state where it is also a primary in the same shard. Similarly, when this node receives slot configs via an UPDATE message, the node won't know about shard_id until it receives a direct PING.

This indicates that, shard_id is a pre-requisite to processing any slot config updates from a node in the cluster. This change introduces a flag to track if the shard_id is initialized and if not, it allows the rest of the code to ignore certain updates until it has learnt about the shard_id. Introducing this new flag allows the code to be robust and extensible. Therefore, if we see any other scenarios in the future that needs shard_id to be set, this flag makes it possible.

Fixes #2261

deepakrn · 2025-09-05T05:08:03Z

@murphyjacob4 - could you please help review this change? I do plan to clean up the tests and remove all the sleeps. Other than this, I want to get your inputs on the high level approach for solving this.

#2261 describes the sequence of events.

In this change:

I am preventing a node from processing any slot config updates if the shard_id of the sender node is still not initialized.
I am changing the order of processing the ping extensions (and hence shard_id) and then updating slot configs.

Copilot

Pull Request Overview

This PR addresses a "two primaries scenario" issue related to unknown shard_id values in cluster environments. The fix introduces a new flag to track nodes with uninitialized shard IDs and modifies processing order to ensure shard IDs are updated before slot configurations.

Introduces CLUSTER_NODE_SHARD_ID_UNINITIALIZED flag to track nodes with unknown shard IDs
Reorders gossip processing before slot configuration updates
Adds safeguards to prevent slot updates when shard ID is uninitialized

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/cluster_legacy.h	Defines new flag constant for tracking uninitialized shard IDs
src/cluster_legacy.c	Implements shard ID tracking logic, reorders processing, and adds validation
tests/unit/cluster/shardid-propagation.tcl	Adds comprehensive test cases for shard ID propagation scenarios

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-05T18:04:56Z

             * from pre-7.2 releases */
            clusterRemoveNodeFromShard(myself);
            memcpy(myself->shard_id, shard_id, CLUSTER_NAMELEN);
+            node->flags &= ~CLUSTER_NODE_SHARD_ID_UNINITIALIZED;


The variable node should be myself to match the context. The code is updating myself->shard_id but clearing flags on node.

Suggested change

node->flags &= ~CLUSTER_NODE_SHARD_ID_UNINITIALIZED;

myself->flags &= ~CLUSTER_NODE_SHARD_ID_UNINITIALIZED;

copilot got this wrong.

however, there is a different issue here. one of the call sites of updateShardId on line 3932 in clusterProcessPacket might be looking at an uninitialized shard-id in some core cases. I am also not super sure about clusterSetPrimary. I think the other two call sites, clusterProcessPingExtensions and clusterCommandSpecial should be good. so rather than take a guess here, I wonder if updateShortId should take a clusterNode* instead so it could check if the reference node's shard id is indeed initialized or not. for the case where we need to set the shard-id explicitly, we can just set the shard-id directly and clear the uninitialized flag, as opposed to calling updateShardId. thoughts?

To elaborate the scenario where updateShardId in clusterProcessPacket might be looking at an uninitialized shard-id:

A node A in node N has uninitialized shard-id because it hasn't directly pinged N yet.

Node B(which was a replica of another node) becomes a replica of node A and hence the sender claimed primary changed. This is where the shard-id of node A which is uninitialized would be associated with B.

So in step 2, we would ideally want node B to be updated to the shard-id of its primary only if it is initialized?

For clusterSetPrimary, it seems to be exercised during SET SLOT command, when replica migration is activated etc. But in those scenarios, I would expect the target node to be reachable and must have directly pinged this node and hence should have the shard_id initialized.

But, I do like your suggestion. We could stall the update until the shard_id of the target node is initialized. I'll make that change. So, there will be two flavors of updateShardId - one that takes the clusterNode reference and another that takes the shard_id string.

Copilot · 2025-09-05T18:04:57Z

@@ -1,3 +1,4 @@
+if {0} {


The if {0} statement will disable the entire first test block, making it unreachable. This appears to be leftover debugging code that should be removed.

deepakrn · 2025-09-05T18:31:59Z

+        if (sender) {
+            clusterProcessGossipSection(hdr, link);
+            clusterProcessPingExtensions(hdr, link);
+        }


Discussed with @PingXie offline. We could simply not move this processing to earlier and instead not update slot configs until the shard_id is initialized. I'll work on making that change.

I'm aligned with this suggestion as well. I don't want to open another pandora box by moving them around.

codecov · 2025-09-05T18:50:02Z

Codecov Report

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 72.23%. Comparing base (78060cb) to head (07c84f5).
⚠️ Report is 346 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/debug.c	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2586      +/-   ##
============================================
+ Coverage     72.21%   72.23%   +0.02%     
============================================
  Files           127      127              
  Lines         70936    70948      +12     
============================================
+ Hits          51227    51252      +25     
+ Misses        19709    19696      -13

Files with missing lines	Coverage Δ
src/cluster.c	`89.98% <100.00%> (ø)`
src/cluster_legacy.c	`87.27% <100.00%> (+0.12%)`	⬆️
src/debug.c	`53.90% <0.00%> (ø)`

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

See valkey-io#2261 for more details. Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

… extensions Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

…but not in nodes.conf Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

PingXie · 2025-09-09T07:00:31Z

@deepakrn can you fix the DCO by git commit --amend -s? since you have multiple commits already, I think the easiest thing to do is to force-push at the end. and please make sure future commits are one with -s.

the clang formatting also seems off.

PingXie · 2025-09-09T07:30:00Z

             * from pre-7.2 releases */
            clusterRemoveNodeFromShard(myself);
            memcpy(myself->shard_id, shard_id, CLUSTER_NAMELEN);
+            node->flags &= ~CLUSTER_NODE_SHARD_ID_UNINITIALIZED;


copilot got this wrong.

however, there is a different issue here. one of the call sites of updateShardId on line 3932 in clusterProcessPacket might be looking at an uninitialized shard-id in some core cases. I am also not super sure about clusterSetPrimary. I think the other two call sites, clusterProcessPingExtensions and clusterCommandSpecial should be good. so rather than take a guess here, I wonder if updateShortId should take a clusterNode* instead so it could check if the reference node's shard id is indeed initialized or not. for the case where we need to set the shard-id explicitly, we can just set the shard-id directly and clear the uninitialized flag, as opposed to calling updateShardId. thoughts?

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

…d shard-ids Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

zuiderkwast · 2025-09-12T08:26:12Z

@murphyjacob4 Why is this a major decision? I can't see any new API or any breaking change here. It looks more like a bug fix to me.

deepakrn · 2025-09-12T16:50:28Z

@zuiderkwast - My initial solution had a flag (unknownshard) for indicating that a node has uninitialized shard ID. Introducing this new flag was what we thought could be a change in interface. However, I later updated the solution to simply filter out nodes with uninitialized shard IDs while saving things to nodes.conf.

For CLUSTER NODES and other topology related commands, there is no change in behavior. So, I think this should not be breaking anymore.

cc - @murphyjacob4

PingXie · 2025-09-15T21:28:42Z

 #define CLUSTER_NODE_EXTENSIONS_SUPPORTED (1 << 10)        /* This node supports extensions. */
 #define CLUSTER_NODE_LIGHT_HDR_PUBLISH_SUPPORTED (1 << 11) /* This node supports light message header for publish type. */
 #define CLUSTER_NODE_LIGHT_HDR_MODULE_SUPPORTED (1 << 12)  /* This node supports light message header for module type. */
+#define CLUSTER_NODE_SHARD_ID_UNINITIALIZED (1 << 13)      /* This node currently has a random shard_id assigned. */


I think CLUSTER_NODE_SHARD_ID_UNINITIALIZED should be a local state and we don't need to use the precious node flags space.

Thanks for pointing this out, @PingXie . I have updated the changes to use a local flag.

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

github-actions Bot assigned deepakrn Sep 5, 2025

PingXie requested a review from Copilot September 5, 2025 18:04

Copilot AI reviewed Sep 5, 2025

View reviewed changes

deepakrn commented Sep 5, 2025

View reviewed changes

murphyjacob4 self-requested a review September 5, 2025 18:56

murphyjacob4 added the major-decision-pending Major decision pending by TSC team label Sep 5, 2025

deepakrn added 2 commits September 7, 2025 03:03

Fix two primaries scenario due to unknown shard_id

496a136

See valkey-io#2261 for more details. Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

Fixing unit tests to not use sleeps and undoing order processing ping…

a67719f

… extensions Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

deepakrn force-pushed the two-primaries branch from e3dbfbc to a67719f Compare September 7, 2025 03:05

deepakrn added 3 commits September 7, 2025 15:05

Minor: typo fix

fa052bd

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

Merge branch 'valkey-io:unstable' into two-primaries

09bdb49

Continue to display nodes with uninitialized shards in cluster nodes …

2e93cec

…but not in nodes.conf Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

PingXie reviewed Sep 9, 2025

View reviewed changes

Fixing the node flags filter for saving topology to nodes.conf

8354ddd

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

deepakrn force-pushed the two-primaries branch from 477e7e9 to 8354ddd Compare September 9, 2025 17:04

deepakrn added 4 commits September 9, 2025 17:19

CLANG format fixes

ff09991

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

Adding a test to assert nodes.conf not having nodes with uninitialize…

ebf0390

…d shard-ids Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

Undo commenting out some unit tests in shardid-propagation

15e25f2

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

reject shard-id updates if the target node's shard is uninitialized

467118f

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

hpatro mentioned this pull request Sep 11, 2025

[BUG] Two primaries in same shard #2261

Open

deepakrn added 3 commits September 12, 2025 00:01

Heal partitions at the end of test to allow graceful shutdown

6b433ee

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

Merge branch 'valkey-io:unstable' into two-primaries

6d8a17a

Limiting code comment width to < 100 column length

eea2d0e

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

PingXie reviewed Sep 15, 2025

View reviewed changes

Merge branch 'valkey-io:unstable' into two-primaries

b491014

deepakrn force-pushed the two-primaries branch from cbbb318 to 334a39b Compare September 20, 2025 00:02

Using local node flags and macros for uninitialized shardid

07c84f5

Signed-off-by: Deepak Nandihalli <deepak.nandihalli@gmail.com>

deepakrn force-pushed the two-primaries branch from 334a39b to 07c84f5 Compare September 20, 2025 00:08

enjoy-binbin mentioned this pull request Dec 29, 2025

Fix empty shard reconfiguration after CLUSTER RESET SOFT #2989

Open

	node->flags &= ~CLUSTER_NODE_SHARD_ID_UNINITIALIZED;
	myself->flags &= ~CLUSTER_NODE_SHARD_ID_UNINITIALIZED;

Uh oh!

Conversation

deepakrn commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepakrn commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

PingXie Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

deepakrn Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

deepakrn Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

hpatro Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PingXie commented Sep 9, 2025

Uh oh!

Uh oh!

PingXie Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

zuiderkwast commented Sep 12, 2025

Uh oh!

deepakrn commented Sep 12, 2025

Uh oh!

PingXie Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

deepakrn Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

deepakrn commented Sep 5, 2025 •

edited

Loading

deepakrn commented Sep 5, 2025 •

edited

Loading

codecov Bot commented Sep 5, 2025 •

edited

Loading