Preventing temporary circular replication and slot loss in Redis Cluster Failover

I've encountered an issue with some nodes in a Redis cluster during a manual failover, where the state seen by certain nodes becomes incorrect.

The normal scenario for a failover is as follows:

```mermaid
sequenceDiagram
actor user
user->>NodeA: redis-cli cluster failover
NodeA->>NodeB: manual failover start
activate NodeB
NodeB->>NodeA: ping with offset
NodeB->>NodeA: ping with offset
NodeA->>NodeC: auth failover
NodeC->>NodeA: vote
Note over NodeA: cluster nodes NodeA: master, 0-100 NodeB: master, 0-0 NodeC: master, 101-200
NodeA->>NodeB: PONG, i'm master, slot 0-100
Note over NodeB: cluster nodes NodeA: master, 0-100 NodeB: replica of NodeA NodeC: master, 101-200
deactivate NodeB
NodeA->>NodeC: PONG, i'm master, slot 0-100
Note over NodeC: cluster nodes NodeA: master, 0-100 NodeB: master, 0-0 NodeC: master, 101-200
NodeB->>NodeA: PONG, i'm replica
Note over NodeA: cluster nodes NodeA: master, 0-100 NodeB: replica of NodeA NodeC: master, 101-200
NodeB->>NodeC: PONG, i'm replica
Note over NodeC: cluster nodes NodeA: master, 0-100 NodeB: replica of NodeA NodeC: master, 101-200
```

However, if NodeA's message is not delivered due to network latency or other reasons, the state viewed by NodeC becomes incorrect:

```mermaid
sequenceDiagram
actor user
user->>NodeA: redis-cli cluster failover
NodeA->>NodeB: manual failover start
activate NodeB
NodeB->>NodeA: ping with offset
NodeB->>NodeA: ping with offset
NodeA->>NodeC: auth failover
NodeC->>NodeA: vote
Note over NodeA: cluster nodes NodeA: master, 0-100 NodeB: master, 0-0 NodeC: master, 101-200
NodeA->>NodeB: PONG, i'm master, slot 0-100
Note over NodeB: cluster nodes NodeA: master, 0-100 NodeB: replica of NodeA NodeC: master, 101-200
deactivate NodeB
NodeB->>NodeA: PONG, i'm replica
Note over NodeA: cluster nodes NodeA: master, 0-100 NodeB: replica of NodeA NodeC: master, 101-200
NodeB->>NodeC: PONG, i'm replica
Note over NodeC: cluster nodes NodeA: replica of NodeB NodeB: replica of NodeA NodeC: master, 101-200
Note over NodeA: delayed some reason...
NodeA->>NodeC: PONG, i'm master, slot 0-100
Note over NodeC: cluster nodes NodeA: master, 0-100 NodeB: replica of NodeA NodeC: master, 101-200
```

In this case, NodeC recognizes NodeA and NodeB as being in a circular replication state, and some slots are lost. This state persists until NodeA sends a PONG to NodeC. This situation can be easily reproduced by dropping packets from NodeA to NodeC using iptables.

I propose a solution that involves delaying the transition to an incorrect state when a node's status changes are detected. Specifically, if a sender is to become a replica, and the sender still owns slots while the new master is a replica of the sender, then the process of turning the sender into a replica should be delayed. This approach can prevent temporary circular replication and slot loss, as well as avoid additional problems(eg: https://github.com/redis/redis/pull/10489#issuecomment-1728593084 , https://github.com/lettuce-io/lettuce-core/issues/2578). (not sure...)

The proposed behavior involves a delay in the transition of a master to a replica in the event of a network partition. However, the scenario where the old master receives the message to become a replica before the message promoting a new master is very rare and unlikely to occur in most situations. Additionally, experiencing 1-2 extra 'moved' errors due to this delay is safer than not being able to find a node at all.

If need any further explanation or details about the situation, please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preventing temporary circular replication and slot loss in Redis Cluster Failover #13018

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Preventing temporary circular replication and slot loss in Redis Cluster Failover #13018

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions