Skip to content
This repository was archived by the owner on Feb 18, 2025. It is now read-only.
This repository was archived by the owner on Feb 18, 2025. It is now read-only.

EnableMasterSSL with graceful-master-takeover-auto errors #1279

@dtest

Description

@dtest

On orchestrator 3.2.3, doing a graceful-master-takeover-auto when AllowTLS is set for the instance will throw an error that replication couldn't be started because the replication threads are already running:

$ orchestrator-client -c topology -a $(orchestrator-client -c clusters)
mysql1:3306   [0s,ok,5.7.32-log,rw,ROW,>>,GTID]
+ mysql2:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]
+ mysql3:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]

$ orchestrator-client -c graceful-master-takeover-auto -i mysql1 -d mysql2
EnableMasterSSL: Cannot enable SSL replication on mysql1:3306 because replication threads are not stopped
{ "Id": 1, "UID": "1607638055531065398:a0a3933a4dd565b60e3becf7cd4c1b6b27060e69c59a862a4b06e0ebcec5f744", "AnalysisEntry": { "AnalyzedInstanceKey": { "Hostname": "mysql1", "Port": 3306 }, "AnalyzedInstanceMasterKey": { "Hostname": "", "Port": 0 }, "ClusterDetails": { "ClusterName": "mysql1:3306", "ClusterAlias": "mysql1:3306", "ClusterDomain": "", "CountInstances": 3, "HeuristicLag": 0, "HasAutomatedMasterRecovery": true, "HasAutomatedIntermediateMasterRecovery": true }, "AnalyzedInstanceDataCenter": "", "AnalyzedInstanceRegion": "", "AnalyzedInstancePhysicalEnvironment": "", "AnalyzedInstanceBinlogCoordinates": { "LogFile": "8141a32e30e9-bin.000003", "LogPos": 644, "Type": 0 }, "IsMaster": true, "IsReplicationGroupMember": false, "IsCoMaster": false, "LastCheckValid": true, "LastCheckPartialSuccess": true, "CountReplicas": 1, "CountValidReplicas": 1, "CountValidReplicatingReplicas": 1, "CountReplicasFailingToConnectToMaster": 0, "CountDowntimedReplicas": 0, "ReplicationDepth": 0, "Replicas": [ { "Hostname": "mysql2", "Port": 3306 } ], "SlaveHosts": [ { "Hostname": "mysql2", "Port": 3306 } ], "IsFailingToConnectToMaster": false, "Analysis": "DeadMaster", "Description": "", "StructureAnalysis": null, "IsDowntimed": false, "IsReplicasDowntimed": false, "DowntimeEndTimestamp": "", "DowntimeRemainingSeconds": 0, "IsBinlogServer": false, "PseudoGTIDImmediateTopology": false, "OracleGTIDImmediateTopology": true, "MariaDBGTIDImmediateTopology": false, "BinlogServerImmediateTopology": false, "SemiSyncMasterEnabled": false, "SemiSyncMasterStatus": false, "SemiSyncMasterWaitForReplicaCount": 0, "SemiSyncMasterClients": 0, "CountSemiSyncReplicasEnabled": 0, "CountLoggingReplicas": 1, "CountStatementBasedLoggingReplicas": 0, "CountMixedBasedLoggingReplicas": 0, "CountRowBasedLoggingReplicas": 1, "CountDistinctMajorVersionsLoggingReplicas": 1, "CountDelayedReplicas": 0, "CountLaggingReplicas": 0, "IsActionableRecovery": true, "ProcessingNodeHostname": "6235160f51df", "ProcessingNodeToken": "690e32e28307d9520449c66844e0003960815b84d5eddc64bdb811fcf14e2f06", "CountAdditionalAgreeingNodes": 0, "StartActivePeriod": "", "SkippableDueToDowntime": false, "GTIDMode": "ON", "MinReplicaGTIDMode": "ON", "MaxReplicaGTIDMode": "ON", "MaxReplicaGTIDErrant": "", "CommandHint": "graceful-master-takeover", "IsReadOnly": false }, "SuccessorKey": { "Hostname": "mysql2", "Port": 3306 }, "SuccessorAlias": "", "IsActive": false, "IsSuccessful": true, "LostReplicas": [], "ParticipatingInstanceKeys": [], "AllErrors": [], "RecoveryStartTimestamp": "", "RecoveryEndTimestamp": "", "ProcessingNodeHostname": "", "ProcessingNodeToken": "", "Acknowledged": false, "AcknowledgedAt": "", "AcknowledgedBy": "", "AcknowledgedComment": "", "LastDetectionId": 0, "RelatedRecoveryId": 0, "Type": "MasterRecovery", "RecoveryType": "MasterRecoveryGTID" }

$ orchestrator-client -c topology -a $(orchestrator-client -c clusters)
mysql2:3306   [0s,ok,5.7.32-log,rw,ROW,>>,GTID]
+ mysql1:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]
+ mysql3:3306 [0s,ok,5.7.32-log,ro,ROW,>>,GTID]

Here are the relevant logs:

2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 62: write-recovery-step
2020-12-10 22:07:36 DEBUG PostponedFunctionsContainer: waiting on 1 postponed functions
2020-12-10 22:07:36 DEBUG PostponedFunctionsContainer: done waiting
2020-12-10 22:07:36 INFO topology_recovery: Executed 1 postponed functions
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 63: write-recovery-step
2020-12-10 22:07:36 INFO topology_recovery: Executed postponed functions: regroup-replicas-gtid mysql2:3306
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 64: write-recovery-step
2020-12-10 22:07:36 DEBUG ChangeMasterTo: will attempt changing master on mysql1:3306 to mysql2:3306, b83336727d97-bin.000001:3075084
2020-12-10 22:07:36 INFO ChangeMasterTo: Changed master on mysql1:3306 to: mysql2:3306, b83336727d97-bin.000001:3075084. GTID: true
2020-12-10 22:07:36 DEBUG ChangeMasterTo: will attempt changing master credentials on mysql1:3306
2020-12-10 22:07:36 INFO ChangeMasterTo: Changed master credentials on mysql1:3306
2020-12-10 22:07:36 INFO Started replication on mysql1:3306
2020-12-10 22:07:36 INFO topology_recovery: No PostGracefulTakeoverProcesses hooks to run
2020-12-10 22:07:36 DEBUG orchestrator/raft: applying command 65: write-recovery-step
[martini] Completed 500 Internal Server Error in 845.505737ms

It seems Replication is started and then another command is run that errors at 500 (the enableMasterSSL I'm pretty sure). I traced it to these lines.

In my case, replication was successfully started anyway because I did not require SSL, but in environments where replication requires SSL I think it would fail.

Is it intentional that the enableMasterSSL call comes after the 'auto' block?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions