Skip to content

Conversation

@stevelipinski
Copy link
Contributor

@stevelipinski stevelipinski commented Aug 8, 2024

PR #13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there was no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on #13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry.

@sundb
Copy link
Collaborator

sundb commented Aug 9, 2024

@stevelipinski i did it in propose, #13428 is a fix for generating corrupt files, but i'm not sure it makes sense to automatically fix an corrupted cluster config file, it might hide other issues that cause config corruption.

@stevelipinski
Copy link
Contributor Author

#13428 - does not fully prevent a corrupt config file, specifically when other nodes are unreachable. So, it is not a complete solution for this. This is verified by simply taking an old 7.0 cluster.nodes file, and spinning up a single instance to use that file. It starts successfully, but if you run CLUSTER SAVECONFIG - it generates a corrupt file. If you SHUTDOWN and then start that instance again, it will not start.

It seems the logic in clusterLoadConfig() is a bit flawed in that it uses random shard-ids for nodes that have not yet been loaded from cluster.nodes. So, if a replica line is parsed first, then it's primary has a random shard-id.

@sundb
Copy link
Collaborator

sundb commented Aug 9, 2024

@stevelipinski did you have the reproduce steps?

@stevelipinski
Copy link
Contributor Author

stevelipinski commented Aug 9, 2024

Sure.

using /tmp/cluster.nodes:

10e7ffc21cc7a7b570f43691ae82b0494e69c913 10.234.140.250:6379@16379 master - 0 1722985884529 9 connected 3277-6553
f13c75ad62684ef01a2958d0d2d32c2a138d4232 10.234.129.133:6379@16379 master - 0 1722985884000 8 connected 6554-9829
e0975496e64a72915138f99adb322b31274be87a 10.234.156.181:6379@16379 myself,master - 0 1722985884000 1 connected 0-3276
0c3dfe612191e2b8c0011589835d24cf07054069 10.234.94.187:6379@16379 slave f13c75ad62684ef01a2958d0d2d32c2a138d4232 0 1722985884529 8 connected
e474e464c3c201ce7b6da55d7d5dc0d01bdfb0b9 10.234.136.80:6379@16379 slave fb9cb25641d9dee1047dbd86f894ab5a2ecca7ab 0 1722985884000 6 connected
c2badfb58c306208034ced68bd8261ddece95738 10.234.140.59:6379@16379 slave,fail 10e7ffc21cc7a7b570f43691ae82b0494e69c913 1722985882008 1722985880498 9 disconnected
09bd2139e57c80645b457840bc44dbbf72714a20 10.234.232.144:6379@16379 slave fd3417b6bfa9b422f9a13359e269e6d9ad462206 0 1722985885133 7 connected
dcb26d1b3d89b0e225ad36250fa6596151d33b9f 10.234.233.3:6379@16379 slave e0975496e64a72915138f99adb322b31274be87a 0 1722985884000 1 connected
fd3417b6bfa9b422f9a13359e269e6d9ad462206 10.234.187.11:6379@16379 master - 0 1722985884529 7 connected 9830-13106
vars currentEpoch 9 lastVoteEpoch 9

/tmp/redis.conf:

cluster-enabled yes
cluster-config-file /tmp/cluster.nodes
cluster-node-timeout 5000

Run single server instance:
# redis-server-7.2.5.patch /tmp/redis.conf

dump cluster config and shutdown:
# redis-cli CLUSTER SAVECONFIG
# redis-cli SHUTDOWN

Attempt to restart:
# redis-server-7.2.5.patch /tmp/redis.conf

Fails:

25117:M 09 Aug 2024 14:18:52.000 # Unrecoverable error: corrupted cluster config file "f13c75ad62684ef01a2958d0d2d32c2a138d4232 10.234.129.133:6379@16379,,tls-port=0,shard-id=182a927aed17e3c1bebd00f746ab3650d2e5d176 master,fail? - 1723213119306 1723213116776 8 connected 6554-9829

@sundb
Copy link
Collaborator

sundb commented Aug 10, 2024

@stevelipinski thanks, you're right, but I still don't like automatically fixing it when loading corrupt config.

@stevelipinski
Copy link
Contributor Author

arguably, that's no different from what is occurring once it re-establishes communication with the other nodes. The master shard_id takes precedence over the randomly generated one.

And, the config generated during upgrade from old, shard_id-less cluster.nodes is corrupt as well - because of the way it's again just randomly generating shard_ids

@sundb
Copy link
Collaborator

sundb commented Aug 13, 2024

@stevelipinski but after this PR(exclude auto fix), we shouldn't generate corrupted config, but if we fix it automatically, if there are still potential bugs that can cause the corruption, we'll never know.

@sundb sundb requested a review from oranagra August 13, 2024 02:05
@stevelipinski
Copy link
Contributor Author

agreed. However to fix it such that you still catch corruption and error on it would require a lot more refactoring and changing of code in cluster.c. This was a trade-off to keep the complexity of the change minimal.

What is more of an issue is when a slave entry is loaded from the cluster.nodes, if there is no master for that slave (master is in a subsequent line in the file), then a "temporary" master object is created with a random shard-id (not the one from the file). So, this is where some corruption is leaking in - the random shard-id generation during this cluster.nodes loading process.

Maybe there is a better solution, but what I put here is the least intrusive, and results in behavior very similar to what happens when the cluster is communicating anyways (e.g., master shard-id overrides whatever gets loaded from the file).
And, the slave shard-id is essentially ignored from the file (same as if the slave was communicating with its master).

@sundb
Copy link
Collaborator

sundb commented Aug 14, 2024

@stevelipinski thanks, i agree with your solution, let's wait for the opinions of others.

@oranagra
Copy link
Member

@sundb i don't presume to know that area good enough to give you an opinion.
i trust your judgement, and the only advise is a general one regarding risk of breaking something seriously for solving a rare or minor issue, but i don't think it applies here.

@sundb
Copy link
Collaborator

sundb commented Aug 14, 2024

@oranagra thanks, i was worried about the possible side effects, let's leave it here more time.
@stevelipinski WDYT?

@stevelipinski
Copy link
Contributor Author

There may be a better way to resolve this, but from the looks of it, that would take some major refactoring of clusterLoadConfig() - of which I'm not comfortable making.
I don't really know how shard-id is used across Redis, so my focus was on the loading and convergence of shard-id in the cluster.nodes.
Because of this, I have to defer to the more-experienced experts on here to point out any potential pitfalls or problems with this approach.

Copy link
Collaborator

@tezc tezc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with cluster code but conceptually looks good to me!

@sundb
Copy link
Collaborator

sundb commented Sep 12, 2024

@oranagra not sure if it's a break change, before this PR we couldn't start server if the config was corrupted, but now we will fix it automatically.

@sundb sundb merged commit d265a61 into redis:unstable Sep 12, 2024
@sundb
Copy link
Collaborator

sundb commented Sep 12, 2024

@stevelipinski merged, thanks.

@oranagra
Copy link
Member

if it wasn't working (or unusable) and now it does, then it's not a breaking change 😄 .
question is there are other side effects impacting cases that used to work for someone...

@sundb
Copy link
Collaborator

sundb commented Sep 12, 2024

if it wasn't working (or unusable) and now it does, then it's not a breaking change 😄 . question is there are other side effects impacting cases that used to work for someone...

no other side effects.

@sundb
Copy link
Collaborator

sundb commented Sep 15, 2024

not sure if this failure is related to this PR:
https://github.com/redis/redis/actions/runs/10866487920/job/30154082183

YaacovHazan pushed a commit that referenced this pull request Oct 30, 2024
PR #13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on #13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
YaacovHazan pushed a commit that referenced this pull request Jan 6, 2025
PR #13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on #13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
sundb added a commit to sundb/redis that referenced this pull request Jan 8, 2025
sundb pushed a commit that referenced this pull request Mar 30, 2025
Close #13868

This bug was introduced by #13468

## Issue
To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.

However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.

## Solution
Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
@YaacovHazan YaacovHazan moved this from Todo to Done in Redis 7.4 Backport Apr 21, 2025
@YaacovHazan YaacovHazan moved this from Done to Pending in Redis 7.4 Backport Apr 21, 2025
@YaacovHazan YaacovHazan moved this from Pending to Todo in Redis 7.4 Backport Apr 21, 2025
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Apr 22, 2025
…#13468)

PR redis#13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on redis#13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Apr 22, 2025
Close redis#13868

This bug was introduced by redis#13468

## Issue
To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.

However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.

## Solution
Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Apr 22, 2025
Close redis#13868

This bug was introduced by redis#13468

To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.

However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.

Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
@YaacovHazan YaacovHazan mentioned this pull request Apr 22, 2025
@YaacovHazan YaacovHazan moved this from Todo to Done in Redis 7.4 Backport Apr 29, 2025
@sundb sundb added this to Redis 8.0 Aug 15, 2025
@sundb sundb removed this from Redis 8.2 Aug 15, 2025
@sundb sundb moved this from Todo to Done in Redis 8.0 Aug 15, 2025
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…#13468)

PR redis#13428 doesn't fully resolve an issue where corruption errors can
still occur on loading of cluster.nodes file - seen on upgrade where
there were no shard_ids (from old Redis), 7.2.5 loading generated new
random ones, and persisted them to the file before gossip/handshake
could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the
cluster.nodes and then the server cannot startup - reports corruption.

This PR builds on redis#13428 by simply ignoring the replica's shard_id in
cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica
appears before the primary in cluster.nodes, where it will first use a
generated shard_id for the primary, and then correct after it loads the
primary cluster.nodes entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
Close redis#13868

This bug was introduced by redis#13468

## Issue
To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.

However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.

## Solution
Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
ShooterIT added a commit that referenced this pull request Nov 1, 2025
- We will generate a random shard id when creating a cluster node, so
`auxFieldHandlers[af_shard_id].isPresent(n) == 0` never meet, so it
means we never add master nodes into `cluster.shards` when loading, this
bug is introduced in #10536 that
supports `shard-id` concept. BTW, #13468 can make replicas add into
`cluster.shards`

- Replica shard_id may be different with master, so before we add again,
we should remove it, otherwise, `cluster.shards` will have dirty shards,
introduced in #13468

These bugs causes the output of the `cluster shards` is corrupt, the
temporary `slot_info` may not be cleaned up, we may see the duplicated
slot ranges, and a shard info without master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants