Atomically update cluster and replication layer while marking self node as primary by hpatro · Pull Request #2510 · valkey-io/valkey

hpatro · 2025-08-19T20:29:11Z

With this change we avoid divergence in cluster and replication layer view. I've observed node can be marked as primary in cluster while it can be marked as replica in replication layer view and have a active replication link. Without this change, we used to end up in a invalid replica chain in replication layer.

… primary Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro · 2025-08-19T20:46:31Z

This makes the code correct around updating state for cluster and replication layer together. I was not able to come up with a test case to produce this scenario.

codecov · 2025-08-19T20:48:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.02%. Comparing base (d7993b7) to head (13b6114).
⚠️ Report is 25 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2510      +/-   ##
============================================
- Coverage     72.03%   72.02%   -0.01%     
============================================
  Files           126      126              
  Lines         70435    70487      +52     
============================================
+ Hits          50740    50771      +31     
- Misses        19695    19716      +21

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`87.44% <100.00%> (-0.31%)`	⬇️

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast

Looks reasonable to me.

Until now, how and when does a node stop replicating after becoming a primary e.g. after a failover?

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro · 2025-08-19T22:04:23Z

Until now, how and when does a node stop replicating after becoming a primary e.g. after a failover?

replicationUnsetPrimary was invoked independently . It's an idempotent method. One example:

valkey/src/cluster_legacy.c

Lines 5096 to 5106 in eae23ba

    
           void clusterFailoverReplaceYourPrimary(void) { 
        
               clusterNode *old_primary = myself->replicaof; 
        
               if (clusterNodeIsPrimary(myself) || old_primary == NULL) return; 
        
               serverLog(LL_NOTICE, "Setting myself to primary in shard %.40s after failover; my old primary is %.40s (%s)", 
        
                         myself->shard_id, old_primary->name, old_primary->human_nodename); 
        
               /* 1) Turn this node into a primary. */ 
        
               clusterSetNodeAsPrimary(myself); 
        
               replicationUnsetPrimary();

…de as primary (valkey-io#2510) Fixes: valkey-io#2460 With this change we avoid divergence in cluster and replication layer view. I've observed node can be marked as primary in cluster while it can be marked as replica in replication layer view and have a active replication link. Without this change, we used to end up in a invalid replica chain in replication layer. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

…de as primary (#2510) Fixes: #2460 With this change we avoid divergence in cluster and replication layer view. I've observed node can be marked as primary in cluster while it can be marked as replica in replication layer view and have a active replication link. Without this change, we used to end up in a invalid replica chain in replication layer. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

…de as primary (valkey-io#2510) Fixes: valkey-io#2460 With this change we avoid divergence in cluster and replication layer view. I've observed node can be marked as primary in cluster while it can be marked as replica in replication layer view and have a active replication link. Without this change, we used to end up in a invalid replica chain in replication layer. --------- Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>

When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see #2283. However, the log added in #2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of #2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>

…3192) When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

…de as primary (redis#2510) Picked from valkey-io/valkey#2510 Fixes: redis#2460 With this change we avoid divergence in cluster and replication layer view. I've observed node can be marked as primary in cluster while it can be marked as replica in replication layer view and have a active replication link. Without this change, we used to end up in a invalid replica chain in replication layer.

clusterNode.shard_id is a fixed-size char[CLUSTER_NAMELEN] buffer that is not guaranteed to be NUL-terminated, so it must be printed with %.40s. This was introduced in #2510. Signed-off-by: Binbin <binloveplay1314@qq.com>

Atomically update cluster and replication layer while marking self as…

30ed8af

… primary Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

hpatro requested review from PingXie, enjoy-binbin, madolson and zuiderkwast August 19, 2025 20:47

hpatro self-assigned this Aug 19, 2025

zuiderkwast approved these changes Aug 19, 2025

View reviewed changes

sarthakaggarwal97 approved these changes Aug 19, 2025

View reviewed changes

madolson reviewed Aug 19, 2025

View reviewed changes

Comment thread src/cluster_legacy.c

cleanup

13b6114

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

madolson approved these changes Aug 19, 2025

View reviewed changes

hpatro merged commit e3bfc5f into valkey-io:unstable Aug 20, 2025
52 checks passed

enjoy-binbin mentioned this pull request Feb 12, 2026

Logging fix or improvement around new shard ID generation #3192

Merged

hpatro mentioned this pull request Jun 3, 2026

Crash on divergence in view at cluster layer and replication layer for self #2513

Closed

enjoy-binbin added this to Valkey 9.0 Jun 9, 2026

github-project-automation Bot moved this to To be backported in Valkey 9.0 Jun 9, 2026

enjoy-binbin mentioned this pull request Jun 9, 2026

Fix shard_id format specifier in UPDATE message log #3942

Merged

valkeyrie-ops Bot moved this from To be backported to Done in Valkey 9.0 Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Atomically update cluster and replication layer while marking self node as primary#2510

Atomically update cluster and replication layer while marking self node as primary#2510
hpatro merged 2 commits into
valkey-io:unstablefrom
hpatro:atomic_cluster_replication_layer

hpatro commented Aug 19, 2025

Uh oh!

hpatro commented Aug 19, 2025

Uh oh!

codecov Bot commented Aug 19, 2025 •

edited

Loading

Uh oh!

zuiderkwast left a comment

Uh oh!

Uh oh!

hpatro commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

hpatro commented Aug 19, 2025

Uh oh!

hpatro commented Aug 19, 2025

Uh oh!

codecov Bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hpatro commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov Bot commented Aug 19, 2025 •

edited

Loading