-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Avoid cluster.nodes load corruption due to shard-id generation #13468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@stevelipinski i did it in propose, #13428 is a fix for generating corrupt files, but i'm not sure it makes sense to automatically fix an corrupted cluster config file, it might hide other issues that cause config corruption. |
|
#13428 - does not fully prevent a corrupt config file, specifically when other nodes are unreachable. So, it is not a complete solution for this. This is verified by simply taking an old 7.0 cluster.nodes file, and spinning up a single instance to use that file. It starts successfully, but if you run CLUSTER SAVECONFIG - it generates a corrupt file. If you SHUTDOWN and then start that instance again, it will not start. It seems the logic in clusterLoadConfig() is a bit flawed in that it uses random shard-ids for nodes that have not yet been loaded from cluster.nodes. So, if a replica line is parsed first, then it's primary has a random shard-id. |
|
@stevelipinski did you have the reproduce steps? |
|
Sure. using /tmp/cluster.nodes: /tmp/redis.conf: Run single server instance: dump cluster config and shutdown: Attempt to restart: Fails: |
|
@stevelipinski thanks, you're right, but I still don't like automatically fixing it when loading corrupt config. |
|
arguably, that's no different from what is occurring once it re-establishes communication with the other nodes. The master shard_id takes precedence over the randomly generated one. And, the config generated during upgrade from old, shard_id-less cluster.nodes is corrupt as well - because of the way it's again just randomly generating shard_ids |
|
@stevelipinski but after this PR(exclude auto fix), we shouldn't generate corrupted config, but if we fix it automatically, if there are still potential bugs that can cause the corruption, we'll never know. |
|
agreed. However to fix it such that you still catch corruption and error on it would require a lot more refactoring and changing of code in cluster.c. This was a trade-off to keep the complexity of the change minimal. What is more of an issue is when a slave entry is loaded from the cluster.nodes, if there is no master for that slave (master is in a subsequent line in the file), then a "temporary" master object is created with a random shard-id (not the one from the file). So, this is where some corruption is leaking in - the random shard-id generation during this cluster.nodes loading process. Maybe there is a better solution, but what I put here is the least intrusive, and results in behavior very similar to what happens when the cluster is communicating anyways (e.g., master shard-id overrides whatever gets loaded from the file). |
|
@stevelipinski thanks, i agree with your solution, let's wait for the opinions of others. |
|
@sundb i don't presume to know that area good enough to give you an opinion. |
|
@oranagra thanks, i was worried about the possible side effects, let's leave it here more time. |
|
There may be a better way to resolve this, but from the looks of it, that would take some major refactoring of clusterLoadConfig() - of which I'm not comfortable making. |
tezc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with cluster code but conceptually looks good to me!
|
@oranagra not sure if it's a break change, before this PR we couldn't start server if the config was corrupted, but now we will fix it automatically. |
|
@stevelipinski merged, thanks. |
|
if it wasn't working (or unusable) and now it does, then it's not a breaking change 😄 . |
no other side effects. |
|
not sure if this failure is related to this PR: |
PR #13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there were no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable). This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption. This PR builds on #13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id. Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com>
PR #13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there were no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable). This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption. This PR builds on #13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id. Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com>
redis#13468)" This reverts commit 52b6c0a.
Close #13868 This bug was introduced by #13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.
…#13468) PR redis#13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there were no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable). This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption. This PR builds on redis#13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id. Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com>
Close redis#13868 This bug was introduced by redis#13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.
Close redis#13868 This bug was introduced by redis#13468 To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.
…#13468) PR redis#13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there were no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable). This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption. This PR builds on redis#13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id. Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry. --------- Co-authored-by: debing.sun <debing.sun@redis.com>
Close redis#13868 This bug was introduced by redis#13468 ## Issue To maintain compatibility with older versions that do not support shardid, when a replica passes a shardid, we also update the master’s shardid accordingly. However, when both the master and replica support shardid, an issue arises: in one moment, the master may pass a shardid, causing us to update both the master and all its replicas to match the master’s shardid. But if the replica later passes a different shardid, we would then update the master’s shardid again, leading to continuous changes in shardid. ## Solution Regardless of the situation, we always ensure that the replica’s shardid remains consistent with the master’s shardid.
- We will generate a random shard id when creating a cluster node, so `auxFieldHandlers[af_shard_id].isPresent(n) == 0` never meet, so it means we never add master nodes into `cluster.shards` when loading, this bug is introduced in #10536 that supports `shard-id` concept. BTW, #13468 can make replicas add into `cluster.shards` - Replica shard_id may be different with master, so before we add again, we should remove it, otherwise, `cluster.shards` will have dirty shards, introduced in #13468 These bugs causes the output of the `cluster shards` is corrupt, the temporary `slot_info` may not be cleaned up, we may see the duplicated slot ranges, and a shard info without master.
PR #13428 doesn't fully resolve an issue where corruption errors can still occur on loading of cluster.nodes file - seen on upgrade where there was no shard_ids (from old Redis), 7.2.5 loading generated new random ones, and persisted them to the file before gossip/handshake could propagate the correct ones (or some other nodes unreachable).
This results in a primary/replica having differing shard_id in the cluster.nodes and then the server cannot startup - reports corruption.
This PR builds on #13428 by simply ignoring the replica's shard_id in cluster.nodes (if it exists), and uses the replica's primary's shard_id.
Additional handling was necessary to cover the case where the replica appears before the primary in cluster.nodes, where it will first use a generated shard_id for the primary, and then correct after it loads the primary cluster.nodes entry.