Skip to content

Atomically update cluster and replication layer while marking self node as primary#2510

Merged
hpatro merged 2 commits into
valkey-io:unstablefrom
hpatro:atomic_cluster_replication_layer
Aug 20, 2025
Merged

Atomically update cluster and replication layer while marking self node as primary#2510
hpatro merged 2 commits into
valkey-io:unstablefrom
hpatro:atomic_cluster_replication_layer

Conversation

@hpatro

@hpatro hpatro commented Aug 19, 2025

Copy link
Copy Markdown
Contributor

Fixes: #2460

With this change we avoid divergence in cluster and replication layer view. I've observed node can be marked as primary in cluster while it can be marked as replica in replication layer view and have a active replication link. Without this change, we used to end up in a invalid replica chain in replication layer.

… primary

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@hpatro

hpatro commented Aug 19, 2025

Copy link
Copy Markdown
Contributor Author

This makes the code correct around updating state for cluster and replication layer together. I was not able to come up with a test case to produce this scenario.

@codecov

codecov Bot commented Aug 19, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.02%. Comparing base (d7993b7) to head (13b6114).
⚠️ Report is 25 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2510      +/-   ##
============================================
- Coverage     72.03%   72.02%   -0.01%     
============================================
  Files           126      126              
  Lines         70435    70487      +52     
============================================
+ Hits          50740    50771      +31     
- Misses        19695    19716      +21     
Files with missing lines Coverage Δ
src/cluster_legacy.c 87.44% <100.00%> (-0.31%) ⬇️

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hpatro hpatro self-assigned this Aug 19, 2025

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

Until now, how and when does a node stop replicating after becoming a primary e.g. after a failover?

Comment thread src/cluster_legacy.c
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
@hpatro

hpatro commented Aug 19, 2025

Copy link
Copy Markdown
Contributor Author

Until now, how and when does a node stop replicating after becoming a primary e.g. after a failover?

replicationUnsetPrimary was invoked independently . It's an idempotent method. One example:

valkey/src/cluster_legacy.c

Lines 5096 to 5106 in eae23ba

void clusterFailoverReplaceYourPrimary(void) {
clusterNode *old_primary = myself->replicaof;
if (clusterNodeIsPrimary(myself) || old_primary == NULL) return;
serverLog(LL_NOTICE, "Setting myself to primary in shard %.40s after failover; my old primary is %.40s (%s)",
myself->shard_id, old_primary->name, old_primary->human_nodename);
/* 1) Turn this node into a primary. */
clusterSetNodeAsPrimary(myself);
replicationUnsetPrimary();

@hpatro hpatro merged commit e3bfc5f into valkey-io:unstable Aug 20, 2025
52 checks passed
rjd15372 pushed a commit to rjd15372/valkey that referenced this pull request Sep 19, 2025
…de as primary (valkey-io#2510)

Fixes: valkey-io#2460

With this change we avoid divergence in cluster and replication layer
view. I've observed node can be marked as primary in cluster while it
can be marked as replica in replication layer view and have a active
replication link. Without this change, we used to end up in a invalid
replica chain in replication layer.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
rjd15372 pushed a commit that referenced this pull request Sep 23, 2025
…de as primary (#2510)

Fixes: #2460

With this change we avoid divergence in cluster and replication layer
view. I've observed node can be marked as primary in cluster while it
can be marked as replica in replication layer view and have a active
replication link. Without this change, we used to end up in a invalid
replica chain in replication layer.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro added a commit to hpatro/valkey that referenced this pull request Oct 3, 2025
…de as primary (valkey-io#2510)

Fixes: valkey-io#2460

With this change we avoid divergence in cluster and replication layer
view. I've observed node can be marked as primary in cluster while it
can be marked as replica in replication layer view and have a active
replication link. Without this change, we used to end up in a invalid
replica chain in replication layer.

---------

Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Feb 12, 2026
When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see valkey-io#2283.

However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:
```
serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);
```

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4
```

But the node shard id is actually:
```
xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"
```

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66
```

This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

Signed-off-by: Binbin <binloveplay1314@qq.com>
enjoy-binbin added a commit that referenced this pull request Feb 24, 2026
When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see #2283.

However, the log added in #2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:
```
serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);
```

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4
```

But the node shard id is actually:
```
xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"
```

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66
```

This is part of #2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

Signed-off-by: Binbin <binloveplay1314@qq.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…3192)

When a cluster reset is performed on a replica node, a new shard ID is generated
because the node is about to become an empty primary node, see valkey-io#2283.

However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary
we will print:
```
serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id);
```

In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new
shard ID, which causes us to print an error shard ID log first.

There is an exmaple, when a replica node performs a cluster reset, we will print:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4
```

But the node shard id is actually:
```
xxx> cluster myshardid
"52ede26d1554dd203161ba09011af14574b2cc84"
```

Now after a new shard ID is generated we will print a log, and we also move the
call to clusterSetNodeAsPrimary after the new shard id, so that we can have the
right one. After this PR:
```
xxx * Cluster reset (user request from 'xxx').
xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66.
xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66
```

This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short
time, so i am gonna extracting it separately as a log fix (or improvement).

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request May 25, 2026
…de as primary (redis#2510)

Picked from valkey-io/valkey#2510

Fixes: redis#2460

With this change we avoid divergence in cluster and replication layer
view. I've observed node can be marked as primary in cluster while it
can be marked as replica in replication layer view and have a active
replication link. Without this change, we used to end up in a invalid
replica chain in replication layer.
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 9.0 Jun 9, 2026
hpatro pushed a commit that referenced this pull request Jun 9, 2026
clusterNode.shard_id is a fixed-size char[CLUSTER_NAMELEN] buffer
that is not guaranteed to be NUL-terminated, so it must be printed
with %.40s.

This was introduced in #2510.

Signed-off-by: Binbin <binloveplay1314@qq.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 10, 2026
clusterNode.shard_id is a fixed-size char[CLUSTER_NAMELEN] buffer
that is not guaranteed to be NUL-terminated, so it must be printed
with %.40s.

This was introduced in #2510.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@valkeyrie-ops valkeyrie-ops Bot moved this from To be backported to Done in Valkey 9.0 Jun 16, 2026
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 22, 2026
clusterNode.shard_id is a fixed-size char[CLUSTER_NAMELEN] buffer
that is not guaranteed to be NUL-terminated, so it must be printed
with %.40s.

This was introduced in #2510.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] Divergence in cluster and replication layer with node promotion in empty shard

5 participants