Skip to content
This repository was archived by the owner on Feb 18, 2025. It is now read-only.
This repository was archived by the owner on Feb 18, 2025. It is now read-only.

Failover recovery, graceful takeover don't work w/binlog #824

@jlevene

Description

@jlevene

We set up a test environment with 1 master and 2 slaves. (We were trying, obviously without success, to have a setup where Orchestrator would work "out of the box".)

Here's the Orchestrator setup file: orchestrator.conf.json.txt

We're using ProxySQL to send SELECTs (without update) to the slaves and everything else to the master.

Originally, failovers did not work at all. We added read_only=1 to the MySQL config files, and added the pre-failover hook recommended by Percona and at least failover started to work. We didn't do anything else to tell Orchestrator about ProxySQL. (According to the Percona article, the post-failover hook they give is no longer needed.)

In the config:

"PreGracefulTakeoverProcesses": [
     "/tmp/prefailover.sh"
  ],

and the /tmp/prefailover.sh script (here, 10.42.42.42 is the VIP of keepalived for the 2 ProxySQL instances):

#!/bin/bash
 
# Variable exposed by Orchestrator
OldMaster=$ORC_FAILED_HOST
PROXYSQL_HOST="10.42.42.42"
 
# stop accepting connections to old master
(
echo 'UPDATE mysql_servers SET STATUS="OFFLINE_SOFT" WHERE hostname="'"$OldMaster"'";'
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -vvv -uivan -p**** -h ${PROXYSQL_HOST} -P6032
 
# wait while connections are still active and we are in the grace period
CONNUSED=`mysql -uivan -p**** -h ${PROXYSQL_HOST} -P6032 -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND srv_host="'"$OldMaster"'"' -B -N 2> /dev/null`
TRIES=0
while [ $CONNUSED -ne 0 -a $TRIES -ne 20 ]
do
  CONNUSED=`mysql -uivan -p**** -h ${PROXYSQL_HOST} -P6032 -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND srv_host="'"$OldMaster"'"' -B -N 2> /dev/null`
  TRIES=$(($TRIES+1))
  if [ $CONNUSED -ne "0" ]; then
    sleep 0.05
  fi
done

Now, if we kill the master, Orchestrator will eventually (5 minutes) promote a slave and get everything working again. When the former master is brought back up, Orchestrator never brings it back into replication; it has to be made a slave manually.

When we try to do a graceful master takeover with a slave from CLI, it refuses, saying ERROR Relocating 1 replicas of stg1wpplatmysql04:3306 below stg1wpplatmysql03:3306 turns to be too complex; please do it manually.

When we try with the GUI (by dragging it "on top of" the master, it also refuses, saying Desginated instance stg1wpplatgarbd02:3306 cannot take over all of its siblings. Error: 2019-03-01 12:13:19 ERROR Relocating 1 replicas of stg1wpplatmysql04:3306 below stg1wpplatgarbd02:3306 turns to be too complex; please do it manually. We also get the following in the log:

Mar  1 10:39:28 stg1wpplatdbmgr01 orchestrator: 2019-03-01 10:39:28 INFO moveReplicasViaGTID: Will move 1 replicas below stg1wpplatmysql03:3306 via GTID

However, we're not using GTID. When we query the Orchestrator with the API, it reports:

# curl -s http://localhost:3000/api/problems | jq
[
  {
    "Key": {
      "Hostname": "stg1wpplatmysql04",
      "Port": 3306
    },
    "InstanceAlias": "",
    "Uptime": 78686,
    "ServerID": 2,
    "ServerUUID": "1dcecf18-3b05-11e9-8732-0050568411f5",
    "Version": "5.7.23-23-57-log",
    "VersionComment": "Percona XtraDB Cluster (GPL), Release rel23, Revision f5578f0, WSREP version 31.31, wsrep_31.31",
    "FlavorName": "Percona",
    "ReadOnly": false,
    "Binlog_format": "ROW",
    "BinlogRowImage": "FULL",
    "LogBinEnabled": true,
    "LogSlaveUpdatesEnabled": false,
    "SelfBinlogCoordinates": {
      "LogFile": "mysql-bin.000007",
      "LogPos": 1628009,
      "Type": 0
    },
...

I don't know whether this is a documentation issue or a bug, or a combination. (I'm very suspicious there's some configuration that would fix this if we only knew how to do it, which would make it a doc issue, I suppose.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions