You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered an issue with some nodes in a Redis cluster during a manual failover, where the state seen by certain nodes becomes incorrect.
The normal scenario for a failover is as follows:
sequenceDiagram
actor user
user->>NodeA: redis-cli cluster failover
NodeA->>NodeB: manual failover start
activate NodeB
NodeB->>NodeA: ping with offset
NodeB->>NodeA: ping with offset
NodeA->>NodeC: auth failover
NodeC->>NodeA: vote
Note over NodeA: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: master, 0-0<br/>NodeC: master, 101-200
NodeA->>NodeB: PONG, i'm master, slot 0-100
Note over NodeB: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: replica of NodeA<br/>NodeC: master, 101-200
deactivate NodeB
NodeA->>NodeC: PONG, i'm master, slot 0-100
Note over NodeC: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: master, 0-0<br/>NodeC: master, 101-200
NodeB->>NodeA: PONG, i'm replica
Note over NodeA: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: replica of NodeA<br/>NodeC: master, 101-200
NodeB->>NodeC: PONG, i'm replica
Note over NodeC: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: replica of NodeA<br/>NodeC: master, 101-200
Loading
However, if NodeA's message is not delivered due to network latency or other reasons, the state viewed by NodeC becomes incorrect:
sequenceDiagram
actor user
user->>NodeA: redis-cli cluster failover
NodeA->>NodeB: manual failover start
activate NodeB
NodeB->>NodeA: ping with offset
NodeB->>NodeA: ping with offset
NodeA->>NodeC: auth failover
NodeC->>NodeA: vote
Note over NodeA: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: master, 0-0<br/>NodeC: master, 101-200
NodeA->>NodeB: PONG, i'm master, slot 0-100
Note over NodeB: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: replica of NodeA<br/>NodeC: master, 101-200
deactivate NodeB
NodeB->>NodeA: PONG, i'm replica
Note over NodeA: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: replica of NodeA<br/>NodeC: master, 101-200
NodeB->>NodeC: PONG, i'm replica
Note over NodeC: cluster nodes<br/>NodeA: replica of NodeB<br/>NodeB: replica of NodeA<br/>NodeC: master, 101-200
Note over NodeA: delayed some reason...
NodeA->>NodeC: PONG, i'm master, slot 0-100
Note over NodeC: cluster nodes<br/>NodeA: master, 0-100<br/>NodeB: replica of NodeA<br/>NodeC: master, 101-200
Loading
In this case, NodeC recognizes NodeA and NodeB as being in a circular replication state, and some slots are lost. This state persists until NodeA sends a PONG to NodeC. This situation can be easily reproduced by dropping packets from NodeA to NodeC using iptables.
I propose a solution that involves delaying the transition to an incorrect state when a node's status changes are detected. Specifically, if a sender is to become a replica, and the sender still owns slots while the new master is a replica of the sender, then the process of turning the sender into a replica should be delayed. This approach can prevent temporary circular replication and slot loss, as well as avoid additional problems(eg: #10489 (comment) , redis/lettuce#2578). (not sure...)
The proposed behavior involves a delay in the transition of a master to a replica in the event of a network partition. However, the scenario where the old master receives the message to become a replica before the message promoting a new master is very rare and unlikely to occur in most situations. Additionally, experiencing 1-2 extra 'moved' errors due to this delay is safer than not being able to find a node at all.
If need any further explanation or details about the situation, please let me know.
I've encountered an issue with some nodes in a Redis cluster during a manual failover, where the state seen by certain nodes becomes incorrect.
The normal scenario for a failover is as follows:
However, if NodeA's message is not delivered due to network latency or other reasons, the state viewed by NodeC becomes incorrect:
In this case, NodeC recognizes NodeA and NodeB as being in a circular replication state, and some slots are lost. This state persists until NodeA sends a PONG to NodeC. This situation can be easily reproduced by dropping packets from NodeA to NodeC using iptables.
I propose a solution that involves delaying the transition to an incorrect state when a node's status changes are detected. Specifically, if a sender is to become a replica, and the sender still owns slots while the new master is a replica of the sender, then the process of turning the sender into a replica should be delayed. This approach can prevent temporary circular replication and slot loss, as well as avoid additional problems(eg: #10489 (comment) , redis/lettuce#2578). (not sure...)
The proposed behavior involves a delay in the transition of a master to a replica in the event of a network partition. However, the scenario where the old master receives the message to become a replica before the message promoting a new master is very rare and unlikely to occur in most situations. Additionally, experiencing 1-2 extra 'moved' errors due to this delay is safer than not being able to find a node at all.
If need any further explanation or details about the situation, please let me know.