Overview of the Issue
EmergencyReparentShard performs stopping replication on replicas and putting the primary into non-serving mode via DemotePrimary in parallel across all the tablets in a cluster / shard.
This makes sense - when calling EmergencyReparentShard, the failover is non-graceful and best-effort, as the state of the primary is potentially unknown and we can't wait for DemotePrimary to finish "normally".
Unfortunately, we don't actually end up giving DemotePrimary any time to properly finish whatever it's doing. When we get back the responses from the replicas, we immediately cancel the context passed to DemotePrimary, which will cancels whatever work hasn't been performed yet - effectively preventing the primary from properly being demoted.
This problematic behaviour seems to be hidden by the SetReplicationSource command that happens a bit later, where the old primary is attempted to be attached back to new primary, which forces it to switch into REPLICA mode.
Reproduction Steps
N/A
Binary Version
Operating System and Environment details
Log Fragments
Overview of the Issue
EmergencyReparentShardperforms stopping replication on replicas and putting the primary into non-serving mode viaDemotePrimaryin parallel across all the tablets in a cluster / shard.This makes sense - when calling
EmergencyReparentShard, the failover is non-graceful and best-effort, as the state of the primary is potentially unknown and we can't wait forDemotePrimaryto finish "normally".Unfortunately, we don't actually end up giving
DemotePrimaryany time to properly finish whatever it's doing. When we get back the responses from the replicas, we immediately cancel the context passed toDemotePrimary, which will cancels whatever work hasn't been performed yet - effectively preventing the primary from properly being demoted.This problematic behaviour seems to be hidden by the
SetReplicationSourcecommand that happens a bit later, where the old primary is attempted to be attached back to new primary, which forces it to switch intoREPLICAmode.Reproduction Steps
N/A
Binary Version
Operating System and Environment details
Log Fragments