Avoid cluster.nodes load corruption due to shard-id generation #13468

stevelipinski · 2024-08-08T21:00:24Z

PR #13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there was no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on #13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry.

sundb · 2024-08-09T00:42:51Z

@stevelipinski i did it in propose, #13428 is a fix for generating corrupt files, but i'm not sure it makes sense to automatically fix an corrupted cluster config file, it might hide other issues that cause config corruption.

stevelipinski · 2024-08-09T13:32:41Z

#13428 - does not fully prevent a corrupt config file, specifically when other nodes are unreachable. So, it is not a complete solution for this. This is verified by simply taking an old 7.0 cluster.nodes file, and spinning up a single instance to use that file. It starts successfully, but if you run CLUSTER SAVECONFIG - it generates a corrupt file. If you SHUTDOWN and then start that instance again, it will not start.

It seems the logic in clusterLoadConfig() is a bit flawed in that it uses random shard-ids for nodes that have not yet been loaded from cluster.nodes. So, if a replica line is parsed first, then it's primary has a random shard-id.

sundb · 2024-08-09T14:15:33Z

@stevelipinski did you have the reproduce steps?

stevelipinski · 2024-08-09T14:23:40Z

Sure.

using /tmp/cluster.nodes:

10e7ffc21cc7a7b570f43691ae82b0494e69c913 10.234.140.250:6379@16379 master - 0 1722985884529 9 connected 3277-6553
f13c75ad62684ef01a2958d0d2d32c2a138d4232 10.234.129.133:6379@16379 master - 0 1722985884000 8 connected 6554-9829
e0975496e64a72915138f99adb322b31274be87a 10.234.156.181:6379@16379 myself,master - 0 1722985884000 1 connected 0-3276
0c3dfe612191e2b8c0011589835d24cf07054069 10.234.94.187:6379@16379 slave f13c75ad62684ef01a2958d0d2d32c2a138d4232 0 1722985884529 8 connected
e474e464c3c201ce7b6da55d7d5dc0d01bdfb0b9 10.234.136.80:6379@16379 slave fb9cb25641d9dee1047dbd86f894ab5a2ecca7ab 0 1722985884000 6 connected
c2badfb58c306208034ced68bd8261ddece95738 10.234.140.59:6379@16379 slave,fail 10e7ffc21cc7a7b570f43691ae82b0494e69c913 1722985882008 1722985880498 9 disconnected
09bd2139e57c80645b457840bc44dbbf72714a20 10.234.232.144:6379@16379 slave fd3417b6bfa9b422f9a13359e269e6d9ad462206 0 1722985885133 7 connected
dcb26d1b3d89b0e225ad36250fa6596151d33b9f 10.234.233.3:6379@16379 slave e0975496e64a72915138f99adb322b31274be87a 0 1722985884000 1 connected
fd3417b6bfa9b422f9a13359e269e6d9ad462206 10.234.187.11:6379@16379 master - 0 1722985884529 7 connected 9830-13106
vars currentEpoch 9 lastVoteEpoch 9

/tmp/redis.conf:

cluster-enabled yes
cluster-config-file /tmp/cluster.nodes
cluster-node-timeout 5000

Run single server instance:
# redis-server-7.2.5.patch /tmp/redis.conf

dump cluster config and shutdown:
# redis-cli CLUSTER SAVECONFIG
# redis-cli SHUTDOWN

Attempt to restart:
# redis-server-7.2.5.patch /tmp/redis.conf

Fails:

25117:M 09 Aug 2024 14:18:52.000 # Unrecoverable error: corrupted cluster config file "f13c75ad62684ef01a2958d0d2d32c2a138d4232 10.234.129.133:6379@16379,,tls-port=0,shard-id=182a927aed17e3c1bebd00f746ab3650d2e5d176 master,fail? - 1723213119306 1723213116776 8 connected 6554-9829

sundb · 2024-08-10T02:26:35Z

@stevelipinski thanks, you're right, but I still don't like automatically fixing it when loading corrupt config.

stevelipinski · 2024-08-12T13:26:57Z

arguably, that's no different from what is occurring once it re-establishes communication with the other nodes. The master shard_id takes precedence over the randomly generated one.

And, the config generated during upgrade from old, shard_id-less cluster.nodes is corrupt as well - because of the way it's again just randomly generating shard_ids

sundb · 2024-08-13T02:04:22Z

@stevelipinski but after this PR(exclude auto fix), we shouldn't generate corrupted config, but if we fix it automatically, if there are still potential bugs that can cause the corruption, we'll never know.

stevelipinski · 2024-08-14T03:41:32Z

agreed. However to fix it such that you still catch corruption and error on it would require a lot more refactoring and changing of code in cluster.c. This was a trade-off to keep the complexity of the change minimal.

What is more of an issue is when a slave entry is loaded from the cluster.nodes, if there is no master for that slave (master is in a subsequent line in the file), then a "temporary" master object is created with a random shard-id (not the one from the file). So, this is where some corruption is leaking in - the random shard-id generation during this cluster.nodes loading process.

Maybe there is a better solution, but what I put here is the least intrusive, and results in behavior very similar to what happens when the cluster is communicating anyways (e.g., master shard-id overrides whatever gets loaded from the file).
And, the slave shard-id is essentially ignored from the file (same as if the slave was communicating with its master).

sundb · 2024-08-14T08:35:46Z

@stevelipinski thanks, i agree with your solution, let's wait for the opinions of others.

oranagra · 2024-08-14T12:52:11Z

@sundb i don't presume to know that area good enough to give you an opinion.
i trust your judgement, and the only advise is a general one regarding risk of breaking something seriously for solving a rare or minor issue, but i don't think it applies here.

sundb · 2024-08-14T12:58:16Z

@oranagra thanks, i was worried about the possible side effects, let's leave it here more time.
@stevelipinski WDYT?

stevelipinski · 2024-08-19T17:46:02Z

There may be a better way to resolve this, but from the looks of it, that would take some major refactoring of clusterLoadConfig() - of which I'm not comfortable making.
I don't really know how shard-id is used across Redis, so my focus was on the loading and convergence of shard-id in the cluster.nodes.
Because of this, I have to defer to the more-experienced experts on here to point out any potential pitfalls or problems with this approach.

tezc

I'm not familiar with cluster code but conceptually looks good to me!

sundb · 2024-09-12T05:50:45Z

@oranagra not sure if it's a break change, before this PR we couldn't start server if the config was corrupted, but now we will fix it automatically.

sundb · 2024-09-12T06:01:30Z

@stevelipinski merged, thanks.

oranagra · 2024-09-12T08:32:35Z

if it wasn't working (or unusable) and now it does, then it's not a breaking change 😄 .
question is there are other side effects impacting cases that used to work for someone...

sundb · 2024-09-12T08:35:57Z

if it wasn't working (or unusable) and now it does, then it's not a breaking change 😄 . question is there are other side effects impacting cases that used to work for someone...

no other side effects.

sundb · 2024-09-15T03:56:33Z

not sure if this failure is related to this PR:
https://github.com/redis/redis/actions/runs/10866487920/job/30154082183

PR #13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there were no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable). This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption. This PR builds on #13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id. Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com>

redis#13468)" This reverts commit 52b6c0a.

Close #13868 This bug was introduced by #13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.

…#13468) PR redis#13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there were no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable). This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption. This PR builds on redis#13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id. Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com>

Close redis#13868 This bug was introduced by redis#13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.

Close redis#13868 This bug was introduced by redis#13468 To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.

…#13468) PR redis#13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there were no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable). This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption. This PR builds on redis#13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id. Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com>

Close redis#13868 This bug was introduced by redis#13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.

- We will generate a random shard id when creating a cluster node, so `auxFieldHandlers[af_shard_id].isPresent(n) == 0` never meet, so it means we never add master nodes into `cluster.shards` when loading, this bug is introduced in #10536 that supports `shard-id` concept. BTW, #13468 can make replicas add into `cluster.shards` - Replica shard_id may be different with master, so before we add again, we should remove it, otherwise, `cluster.shards` will have dirty shards, introduced in #13468 These bugs causes the output of the `cluster shards` is corrupt, the temporary `slot_info` may not be cleaned up, we may see the duplicated slot ranges, and a shard info without master.

sundb and others added 11 commits July 18, 2024 18:33

Use shard-id of the master if the replica is inconsistent with master

1fc3565

Update the shards of master or replicas when inconsistent

78fb033

Update comment

258dd82

Format

fe5cd2c

Format

d0f2eb6

Merge branch 'unstable' into cluster_shard_id_inconsistent

15e4b6c

Rename assignShardToNode to assignShardIdToNode

840950c

Updating myself is already handled above

aa976d3

force use of primary shard_id in cluster loadConfig

cd879cc

force use of primary shard_id in cluster loadConfig

7486eb3

force use of primary shard_id in cluster loadConfig

39f4e33

stevelipinski mentioned this pull request Aug 8, 2024

Use shard-id of the master if the replica is inconsistent with master #13428

Closed

sundb requested a review from oranagra August 13, 2024 02:05

sundb mentioned this pull request Aug 19, 2024

Use shard-id of the master if the replica does not support shard-id #12805

Merged

tezc approved these changes Sep 12, 2024

View reviewed changes

sundb approved these changes Sep 12, 2024

View reviewed changes

sundb merged commit d265a61 into redis:unstable Sep 12, 2024

sundb mentioned this pull request Sep 12, 2024

[BUG] Unrecoverable error: corrupted cluster config file "5270a2453e7db28eee53f976faca81306e649b19... #13456

Closed

sundb added a commit to sundb/redis that referenced this pull request Jan 8, 2025

Revert "Avoid cluster.nodes load corruption due to shard-id generation (

d2055ba

redis#13468)" This reverts commit 52b6c0a.

sundb mentioned this pull request Mar 18, 2025

[BUG] cluster nodes file never stabilizes after cluster creation in 7.2.7 #13868

Closed

sundb mentioned this pull request Mar 27, 2025

Ignore shardId updates from replica nodes #13877

Merged

YaacovHazan moved this from Todo to Done in Redis 7.4 Backport Apr 21, 2025

YaacovHazan moved this from Done to Pending in Redis 7.4 Backport Apr 21, 2025

YaacovHazan moved this from Pending to Todo in Redis 7.4 Backport Apr 21, 2025

YaacovHazan mentioned this pull request Apr 22, 2025

Redis 7.4.3 #13963

Merged

YaacovHazan moved this from Todo to Done in Redis 7.4 Backport Apr 29, 2025

sundb mentioned this pull request Jun 19, 2025

[BUG] cluster node config has been corrupted #14137

Closed

sundb added this to Redis 8.0 Aug 15, 2025

github-project-automation bot moved this to Todo in Redis 8.0 Aug 15, 2025

sundb removed this from Redis 8.2 Aug 15, 2025

sundb moved this from Todo to Done in Redis 8.0 Aug 15, 2025

ShooterIT mentioned this pull request Oct 31, 2025

Fix loading old version node.conf without shard_id #14495

Merged

Avoid cluster.nodes load corruption due to shard-id generation #13468

Avoid cluster.nodes load corruption due to shard-id generation #13468

Uh oh!

Conversation

stevelipinski commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sundb commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevelipinski commented Aug 9, 2024

Uh oh!

sundb commented Aug 9, 2024

Uh oh!

stevelipinski commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sundb commented Aug 10, 2024

Uh oh!

stevelipinski commented Aug 12, 2024

Uh oh!

sundb commented Aug 13, 2024

Uh oh!

stevelipinski commented Aug 14, 2024

Uh oh!

sundb commented Aug 14, 2024

Uh oh!

oranagra commented Aug 14, 2024

Uh oh!

sundb commented Aug 14, 2024

Uh oh!

stevelipinski commented Aug 19, 2024

Uh oh!

tezc left a comment

Choose a reason for hiding this comment

Uh oh!

sundb commented Sep 12, 2024

Uh oh!

sundb commented Sep 12, 2024

Uh oh!

oranagra commented Sep 12, 2024

Uh oh!

sundb commented Sep 12, 2024

Uh oh!

sundb commented Sep 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevelipinski commented Aug 8, 2024 •

edited

Loading

sundb commented Aug 9, 2024 •

edited

Loading

stevelipinski commented Aug 9, 2024 •

edited

Loading

sundb commented Sep 15, 2024 •

edited

Loading