Ignore shardId updates from replica nodes #13877

jdork0 · 2025-03-24T12:16:49Z

This bug was introduced by #13468

Issue

To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly.

However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid.

Solution

Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.

…ry nodes.

sundb · 2025-03-26T13:38:52Z

@jdork0 It seems that this fix is not correct, there are two scenarios:

master supports shardid but replica doesn't.
We have handled this situation correctly. In any case, we need to update the shardid of all replicas under this Master
master doesn't support shardid but replica does.
This scene is actually being fixed by this PR,
If the replica supports shardid, the replica will give a shardid different from its master.
If we update it, its shardid and master will be different.
Therefore, I think we should verify whether the incoming shardid is consistent with the master at any time, and we will update it only if it is consistent.

        if (node->slaveof == NULL) {
            assignShardIdToNode(node, shard_id, CLUSTER_TODO_SAVE_CONFIG);
            for (int i = 0; i < clusterNodeNumSlaves(node); i++) {
                clusterNode *slavenode = clusterNodeGetSlave(node, i);
                if (memcmp(slavenode->shard_id, shard_id, CLUSTER_NAMELEN) != 0)
                    assignShardIdToNode(slavenode, shard_id, CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_FSYNC_CONFIG);
            }
        } else if (memcmp(node->slaveof->shard_id, shard_id, CLUSTER_NAMELEN) == 0) {
            assignShardIdToNode(node, shard_id, CLUSTER_TODO_SAVE_CONFIG);
        }

jdork0 · 2025-03-26T14:56:39Z

@sundb, I think that looks ok. I'll do some testing and update the PR.

jdork0 · 2025-03-26T20:18:28Z

@sundb, updated. I didn't run into any problems in my testing with the proposed change.

sundb · 2025-03-27T05:55:39Z

src/cluster_legacy.c

         * 1. Master supports but the replica does not.
-         *    We might first update the replica's shard-id to the master's randomly
-         *    generated shard-id. Then, when the master's shard-id arrives, we must
-         *    also update all its replicas.
+         *    All replicas under master will receive master shard-id. When coming
+         *    from a release with no shard-id the master shard-id will be randomly
+         *    generated.
         * 2. If the master does not support but the replica does.
-         *    We also need to synchronize the master's shard-id with the replica.
-         * 3. If neither of master and replica supports it.
-         *    The master will have a randomly generated shard-id and will update
-         *    the replica to match the master's shard-id. */
+         *    Only update replica shard-id after master has provided one. */


what about

/* We always make our best effort to keep the shard-id consistent * between the master and its replicas: * * 1. When updating the master's shard-id, we simultaneously update the * shard-id of all its replicas to ensure consistency. * 2. When updating replica's shard-id, if it differs from its master's shard-id, * we discard this replica's shard-id and continue using master's shard-id. * This applies even if the master does not support shard-id, in which * case we rely on the master's randomly generated shard-id. */

jdork0 · 2025-03-27T18:33:23Z

@sundb, is there any timeframe where we could expect this fix to make it back into 7.2?

src/cluster_legacy.c

Close redis#13868 This bug was introduced by redis#13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.

Close redis#13868 This bug was introduced by redis#13468 To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.

Close redis#13868 This bug was introduced by redis#13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.

Fix for redis#13868 so shard id updates will only be taken from prima…

2d6abc9

…ry nodes.

jdork0 mentioned this pull request Mar 24, 2025

[BUG] cluster nodes file never stabilizes after cluster creation in 7.2.7 #13868

Closed

sundb added this to Redis 7.4 Backport, Redis 7.2 Backport and Redis 8.2 Mar 24, 2025

github-project-automation bot moved this to Todo in Redis 7.2 Backport Mar 24, 2025

github-project-automation bot moved this to Todo in Redis 7.4 Backport Mar 24, 2025

github-project-automation bot moved this to Todo in Redis 8.2 Mar 24, 2025

sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Mar 24, 2025

Ignore shardId updates from replica nodes

c94ebca

sundb reviewed Mar 27, 2025

View reviewed changes

Ignore shardId updates from replica nodes

c4a5342

sundb approved these changes Mar 27, 2025

View reviewed changes

tezc reviewed Mar 27, 2025

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

tezc approved these changes Mar 28, 2025

View reviewed changes

sundb merged commit aa8e2d1 into redis:unstable Mar 30, 2025
18 checks passed

github-project-automation bot moved this from Todo to Done in Redis 8.2 Mar 30, 2025

YaacovHazan moved this from Todo to Done in Redis 7.4 Backport Apr 29, 2025

YaacovHazan moved this from Todo to Done in Redis 7.2 Backport Apr 29, 2025

sundb mentioned this pull request Jun 19, 2025

[BUG] cluster node config has been corrupted #14137

Closed

sundb removed this from Redis 8.2 Aug 15, 2025

sundb added this to Redis 8.0 Aug 15, 2025

github-project-automation bot moved this to Todo in Redis 8.0 Aug 15, 2025

sundb moved this from Todo to Done in Redis 8.0 Aug 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ignore shardId updates from replica nodes #13877

Ignore shardId updates from replica nodes #13877

Uh oh!

jdork0 commented Mar 24, 2025 •

edited by sundb

Loading

Uh oh!

sundb commented Mar 26, 2025 •

edited

Loading

Uh oh!

jdork0 commented Mar 26, 2025

Uh oh!

jdork0 commented Mar 26, 2025

Uh oh!

sundb Mar 27, 2025

Uh oh!

jdork0 commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ignore shardId updates from replica nodes #13877

Ignore shardId updates from replica nodes #13877

Uh oh!

Conversation

jdork0 commented Mar 24, 2025 • edited by sundb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Solution

Uh oh!

sundb commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdork0 commented Mar 26, 2025

Uh oh!

jdork0 commented Mar 26, 2025

Uh oh!

sundb Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

jdork0 commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jdork0 commented Mar 24, 2025 •

edited by sundb

Loading

sundb commented Mar 26, 2025 •

edited

Loading