Add support for recovery of async/semisync replicas of failed replication group members#1254
Conversation
shlomi-noach
left a comment
There was a problem hiding this comment.
please see inline comments
go/logic/topology_recovery.go
Outdated
There was a problem hiding this comment.
so we re-use the configuration
but in analysis_dao.go it seems like you've changed that: intermediate master recovery only takes place under
if !a.IsReplicationGroupMember {There was a problem hiding this comment.
What I mean here is that we re-use analysisEntry.ClusterDetails.HasAutomatedIntermediateMasterRecovery configuration to decide whether we want to fail-over group members as opposed to having a separate configuration. As mentioned in the method's doc comment, we are operating under the assumption that group secondaries with replicas are akin to intermediate masters in the sense that they perform a very similar function in the replication chain; get and apply changes from the primary (except, via GR instead of binlog), and distribute them to replicas (via the binlog). I hope this clarifies my intent.
go/logic/topology_recovery.go
Outdated
There was a problem hiding this comment.
I don't run Group Replication myself, but I think it can be debatable whether it is correct to run PostIntermediateMasterFailoverProcesses. For now, let's keep it at that, but I predict that someone in the future will argue against this.
There was a problem hiding this comment.
For now, our use case does not seem to require different hooks for these. If the need arises (or someone comes knocking at your door asking for it) I'd be happy to change this to have different GR and intermediate source hooks.
…tion group members.
Related issue: #1253
Description
This PR addresses the issue mentioned above. It does so by adding failure detection and recovery for replication group members that have traditional async/semi-sync replicas.
cc @sjmudd, @dveeden, @luisyonaldo.